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ARTinOAL  INTELLIGENCE  METHODOLOGIES  IN  FLIGHT  RELAI^D 
DIFFERENTIAL  GAME,  CONTROL  AND  OPTIMIZATION  PROBL^S 


January  31, 1993 


This  is  the  final  report  for  grant  AFOSR-89-0518.  While  its  principal  portion  is  the 
last  part  of  this  report,  it  may  be  appropriate  to  summarize  first  the  achievements 
of  the  past  three  years.  We  begin  by  reprinting  the 

1. 1989-90  Annual  Technical  Report  Executive  Summary;  and  the 

2. 1990-91  Annual  Technical  Report  Executive  Summary. 

The  above  two  items  give  an  overview  of  our  activities  and  achievements  during 
the  first  two  years  of  this  grant 

The  main  part  of  the  current  report  is  entitled 

3.  Artificial  Intelligence  Methodologies  for  Aerospace 
and  Other  Control  Systems; 


it  is  the  culmination  of  a  project  by  the  Principal  Investigator  and  one  of  his 
students.  The  major  subjects  addressed  in  this  report  are  the  following. 

3a.  Neural  Networks  Approach  to  Control  Systems;  ^ 


3b.  Differential  Games  with  Neural  Networks; 


\ 


3c.  Aircraft  Control  in  the  Presence  of  Windshear; 

3d.  Optimal  Control  in  a  Layered  Defense  System.  ^ 

Concerning  our  other  activities  in  the  course  of  the  year:  we  attempted  to  report 
on  most  of  them  by  letter  to  AFOSR,  as  they  occiured  (Copies  of  the  letters 

enclosed).  Examples  are  presentations  given  by  o'u:  group  and  publications,  of  _ _ 

which  we  have  ^eady  provided  copies  to  AFOSR.  We  list  these  here  in  For 

chronological  ordei*:  i  ^ 

□ 

4.  Teaching  Neural  Networks  Nuclear  Physics;  ^  □ 

(an  extensive  undergraduate  project)  ‘  - 

5.  System  Identification  with  Dynamic  Netiral  Networks; - 

(preprint)  - 

AvHtlablllty  Co<J«s 

jAvaii  and/or 
Diet  Speolal 


6.  Control  and  Disturbance  Rejection  with  a  Dynamic  Neurocontroller; 

(preprint) 

7.  Maneuver  Prediction  in  Air  Combat  Via  Artificial  Neural  Networks; 

(reprint) 

8.  Adjacency  of  the  0-1  Knapsack  Problem; 

(reprint) 

9.  On  Differential  Games  With  Neural  Networks; 

(Proceedings  of  an  AFOSR  Workshop) 

10.  Character  Recognition:  Qualitative  Reasoning  and  Neural  Networks; 

(reprint) 

11.  Collision  Avoidance  and  Low>Observable  Navigation 
in  a  Dynamic  Enviroiunent; 

(reprint) 

12.  An  Optimization  Algorithm  with  Probabilistic  Estimation; 

(preprint) 

There  were  two  particular  activities  with  which  we  were  involved  extensively  in 
the  course  of  the  past  year: 

13.  Organizing  several  reciprocal  visits,  presentations  and  discussions  between 
our  group  and  the  Analysis  Group  at  HQ  MAC,  Scott  AFB; 

14.  Co*sponsoring  "ANNIE  '92":  an  international  conference  on  artificial 
neural  networks  in  engineering. 

Finally,  we  also  established  a  working  relationship  with  three  St.  Louis  area 
companies: 

ESCO  Corporation: 

(Collaborating  with  them  to  develop  an  AI  assisted 
and  PC  based  unarmed  aircraft  defense  system) 

United  VaiUines: 

(Helping  them  to  develop  optimal  routing  and  scheduling  algorithms) 

SL  Louis  Post  Dispatch: 

(Attempting  to  apply  the  AI  technology  developed  by 
us  to  the  communication  needs  of  the  21st  centtiry) 


ANINUALTECHNICAI.  REPORT  EXECUTIV!:  SUMMARY 


The  principal  portion  of  this  Annuel  Technical  Report  is  a  work  by  the  Principal  Investigator  end 
one  of  his  students: 


Semantic  Control  in  Continuous  Systems: 

Applications  to  Aerospace  Problems. 

This  report  discusses  our  new  methodology  for  deofing  with  time  dependent  control  ond 
optimization  problems:  and,  in  particular,  its  application  to  combat  path  planning  in  the 
presence  of  multiple  opposing  radar  coverage,  with  time  dependent  scheduling  problems  and 
with  flight  and  fire  control  via  logic  programming. 

Preceding  this  report  we  are  presenting  a  brief  discussion,  intended  to  explain  and  justify  why  we 
t  ii  t  il  led  lo  broaden  our  original  proposed  aims  and  begin  to  consider 

Stochastic  OpfimizGticn  Problems. 

The  importance  oi*  this  extension  seems  particularly  evident  in  the  light  of  the  tasks  and  missions 
of  j 

I 

•  1 

I  Desert  Sword . 

j 

In  the  course  cf  the  reporting  period  we  also  submitted  to  AFOSR  copies  of  the  writeups  of  two 
additional  projects  that  we  have  completed.  We  are  attaching  copies  of  the  relevant  covering 
letters  here.  These  consisted  of  the  following: 

I  Flight  ar>d  Fire  Control  with  Logic  Programming 
by 'Ervin  Y.  Rodin  and  D.  Geist;  Comp,  and  Math,  with  Applications, 

Vol.  20.  No.  9/10.  pp.  15-27. 1990. 

Methods  for  Stochastic  Optimization 

by  Dl  Yan  and  H.  Mukai:  a  Technical  Report  by  the 
Center  for  Optimization  and  Semantic  Control. 

I  ' 

We  also  trarrsmitted  copies  of  the  doctoral  dissertation  of  another  student  of  the  Principal 
Investigator.  While  that  person  was  not  supported  by  this  grant,  and  his  work  then  appeared  to 
hove  no  relevance  to  the  project  at  hand,  we  felt  that  since  f^e  Artificial  intelligence 
methodologies  employed  in  that  dissertation  were  derived  from  ours,  it  may  be  appropn'ate  to 
present  those  results  to  the  AFOSR.  Now,  however,  with  the  threat  of  large  scale  Iroqui  sabotage 
of  Middle  Eastern  oil  fields  a  possibility,  that  dissertation  may  become  very  relevant  indeed: 

Acidic  Deposition  Control  Through  on  Ariificloi  Intelligence  Method 

by  Ji-Shing  Lin. 

We  were  also  proud  to  report  in  the  course  of  the  past  year  that  one  of  our  graduate  students, 
Kevin  Ruland,  who  has  been  involved  with  our  resea  ch  projects  for  two  years  now,  was 
awarded  the  very  prestigious 


Mercury  Seven  Fellowship. 

An  odditlonal  item  of  possible  relevance  here  Is  that  the  P.i.  was  elected  in  the  course  of  the 
reporting  period  to  be  an  Associate  Fellow  of  the 

American  institute  of  Aeronautics  and  Astronautics, 

ond  also  a  member  of  the  Advisory  Committee  of  its  St.  Louis  chapter. 


Finally,  we  ore  glad  to  report  that  our  direct  contacts  and  collaboration  with  various  elements  of 
the  • 


United  Slates  Air  Force 


have  been  increasing  and  it  seems  that  our  group  is  becoming  progressively  more  usefiJ  to 
them.  In  this  regard,  we  can  list  the  following  accomplishments  for  the  penod  of  this  report. 

1  Several  working  visits  by  USAF/MAC  personnel  at  our  facilities;  and  several  visits  by  us  at 

Scott  Air  Force  Ba;^. 

in  order  to  discuss  reserch  problems  and  results  attained  by  us.  (See  attached  letter  by 

Col.  J.D.  Graham,  dnd  the  page  after  it.) 


2.  volunteer  Service  Agreement  between  Scott  AFB  and  Washington  University,  as 
proposed  and  implemented  by  the  P.l.  under  this  Grant. 


3.  A  Washington  University  -  MAC  Intern  Program  s  description. 

4.  The  nominotion  of  the  P.I.,  Dr.  Ervin  Y.  Rodin,  to  membership  in  the  AF  Scientific  Advisory 
Board,  by  Lt.  General  A.J.  Burschnick. 

5.  We  also  list  here  the  Scott  AFB/MAC  operational  projects  on  which  we  are  currently 
working: 


Closure  Optimization 
Dofcnso  Courier  Service 
Aeromedical  Evacuation 


4  Finally  we  should  also  mention  in  this  section  that  we  are  making  excellent  progress  on 
mrSevelopment  of  integer  constraint  reiaxotion  paradigms,  which  wiil  be  particularly 
useful  for  the  KORBX  computer  of  Scott  AFB 

Finally,  we  should  mention  here  that,  in  addition  to  our  previous  good 
Rockwell  International,  we  hove  also  developed  close  contacts  and  mutual 
McDonnell  Douglas  and  with  Emerson  Electric.  Scientists  and  engineers  fror^  these  companies 
now  regularly  visit  with  us;  one  such  group  visit  took  place  in  conjunction  with  our  full-day 
presentation  for  visitors  from 

HQ,  Strategic  Air  Command. 

We  are  attaching  a  one-page  informational  aheet  about  that  also. 


ANNUAL  TECHNICAL  REPORT  EXECUTIVE  SUMMARY 


This  is  the  annual  report  for  the  second  year  of  our  current  three-year  grant:  thus,  several 
of  our  major  projects  are  in  the  midst  of  being  developed.  It  may  be  appropriate,  therefore,  to 
begin  this  report  vydth  brief  descriptions  of  those  projects  that  we  expect  to  conclude  in  the  course 
of  the  coming  year.  There  are  actually  three  such  projects,  each  of  which  will  become  a  doctoral 
dissertation  under  the  guidance  of  the  Principal  Investigator: 

1.  Polyhedral  Computations  For  Many^To-Many  Routing  Problems; 

Applications  To  Air  Transport; 

2.  Artificial  Intelligence  Methodologies  In  Control  Systems; 

3.  System  Identification  With  Dynamic  Neural  Networks. 

We  begin  our  report  by  providing  brief  descriptions  of  the  current  status  of  our  research 
for  each  of  the  above  subjects. 

Several  of  our  projects  have  their  genesis  in  our  collaborative  efforts  with  the  CINCM  AC 
Analysis  Group  of  HQ/MAC  at  Scott  AFB.  The  formal  arrangement  of  this  collaboration  was  set 
out  in  a  Volunteer  Service  Agreement  between  375  MSSQ/MSCS  Scott  AFB  and  Washington 
University,  the  details  of  which  were  included  in  our  annual  report  last  year.  During  this  past 
year  2  groups  of  two  senior  students  each,  and  one  group  of  thrw  students  performed  studies 
relating  to  die  following  MAC  problems: 

1.  Defense  Courier  Routing  Problem; 

2.  Qostire  Optimization; 

3.  Operational  Support  Aircraft  Vehicle  Scheduling  Problem. 

We  presented  these  reports  to  the  technical  staff  of  the  QNCMAC  Analysis  Group  in  the 
course  of  one  of  our  regular  meetings  with  them;  in  fact,  we  also  presented  to  them  an  entire 
written  report  about  the  first  two  of  these,  with  copies  also  provided  to  AFOSR.  For  this  reason, 
we  are  including  in  this  report  only  the  first  few  pages  of  that  transmittal.  However,  we  are 
attaching  here  a  copy  of  the  third  and  shortest  report,  which  was  not  submitted  to  the  AFOSR. 

It  may  be  appropriate  to  mention  that  a  fourth  senior  student  group  was  also  working  on 
a  project  related  to  this  grant  (but  not  related  to  our  Scott  AFB  oriented  work).  Only  the  cover 
page  of  their  report,  entitled 

Situation  Assessment  In  Meditun  Range  Air  Combat, 


is  included  in  this  Annual  Report. 


Several  of  our  research  results  were  prepared  in  the  course  of  this  past  year  for 
publication.  We  are  including  some  of  these  in  this  report.  The  publication  on 

i.  Adjacenqr  of  the  0-1  Knapsack  Problem, 

is  a  byproduct  of  our  work  on  Semantic  Control  In  Cantinuaus  Systems:  Applications  To  Aerosvace 
Problems.,  which  was  presented  in  last  year's  Annual  Report.  The  next  item, 

iL  Differential  Games  and  Neural  Nets, 

was  developed  jointly  with  McDonnell  Douglas  Missiles  Systems  Company  scientists,  and  it  is  an 
ongoing  project  and  collaborative  effort. 

Our  attempts  to  utilize  neural  networks  in  control,  optimization  and  differential  game 
type  problems  led  us  to  the  realization,  that  much  more  powerful,  self-tuning  networks  of  this 
type  should  be  developed.  This  led  to  our  first  report  on 

iiL  Neural  Networks  With  Local  Memory  For  Control  Systems, 

which  is  our  next  enclosure. 

We  reported  last  year  on  our  work  on  Tactical  Air  Combat  Maneuvers:  Recognition  And 
Guidance  Via  Neural  Networks.  This  past  /ear  we  attempted  to  utilize  that  same  technology,  to 
identify  the  output  of  an  arbitrary  "black  box".  A  first  step  in  that  direction  was  our  next 
included  item,  consisting  of  our  work  on 

iv.  Character  Recognition:  A  New  Approach  Using  Neural  Networks. 

The  last  item  in  this  section  is  in  fact  the  longest  one;  a  detailed  re]3ort  on  the 

V.  Application  of  Semantic  Control  To  A  Class  Of  Pursuer-Evader  Problems. 

This  was  a  project  which  we  undertook  jointly  with  scientists  from  the  ESCO 
Corporation.  From  our  point  of  view,  the  importance  of  the  research  here  was  in  proving  the 
feasibility  of  creating  a  rule  based  expert  system,  which  is  capable  of  c  ’''ng  on  exact  optimization 
algorithms  as  subroutines,  and  which  can  be  implemented  on  small  computers.  This  is  still  an 
ongoing  project:  we  expect  to  provide  further  results  in  next  year's  report.  (Note:  we  are  not 
including  here  the  len^hy  appendices  to  this  work,  which  consist  of  detailed  computer  listings.) 


Friday,  January  3 1 . 1 992 


Dr.  Neal  Glassman 

AFOSR/NM 

Bldg.  4 1C,  Bolling  AFB 

Washington,  DC  20332-6448 

Dear  Neal: 

As  I  may  have  mentioned  to  you  in  the  past,  I  am  always  trying  to  get  as  many 
undergraduates  as  possible  to  get  involved  in  our  various  research  projects. 
Some  of  these  result  in  nice  outcomes,  and  some  are  just  so-so. 

During  the  past  year  I  encouraged  one  such  undergraduate  to  try  his  hand  at 
using  neural  nets  for  a  physics  related  problem.  Since  his  results  were  pretty  nice, 
1  decided  to  send  you  and  Arje  a  few  copies,  enclosed  here. 

With  belated  good  wishes  for  the  new  year  and  best  personal  regards. 

Sincerely  yours. 


Ervin  Y.  Rodin 
Professor  and 
Director,  COSC 

enc.:  3  copies  of  Teaching  Neural  Networks  Nuclear  Physics 


Vasliinuton  rnivcrsit> 
Campus  H<)x  IOhI) 

St.  hmis,  .Missouri  63139-4S99 
Tel:  (3H)  8H9-600',  -5H06 
(314) '26 -^^3^ 


Monday,  June  1, 1992 


Dr.  Arje  Nachman 
AFOSR/NM 
Bldg.  410,  BolHng  AFB 
Washington,  DC  20332-6448 

Dear  Arje: 

Since  a  portion  of  our  research  under  our  present  grant  involves  the  tuning  and 
utilization  of  neural  networks,  and  since  we  have  made  some  nice  strides  in  Urat 
direction,  we  decided  to  publish  some  of  our  results  related  to  this  area.  So,  to  bring  this 
to  your  early  attention,  I  am  sending  you  attached  two  preprints  from  our  Center: 

1.  System  Identification  With  Dynamic  Neural  Networks;  and 

2,  Control  and  Disturbance  Rejection  With  A  Dynamic  NeuroconU  oiler. 

With  best  regards. 

Sincerely  yours. 


Ervin  Y.  Rodin 
Professor  and 
Director,  COSC 

enc.:  3  copies  each  of  1.  and  2.  above 


cc.  Dr.  Neal  Classman 


V^’a^hillxtl>n  I 'niver.sit\- 

Onipu>  U<)X  KMd 

.S(.  Ijiiiiis,  Missouri  WI.V^ -iHW 

Tel:  »3H)  HH9-«K)~  .3806 

lvVX:(31^)~26^^3^ 


m 


-Washington 

\\.\<Hl\'C.TON  I  M\  ERS1TY- IN  ST-  LOl  IS 

t  viiHT  till' ijpaii u/.ilu 'll  .rill  1  rill 


Monday,  July  6, 1992 

Dr.  Arje  Nachman 
AFOSR/NM 
Bldg.  410,  Bolling  AFB 
Washington,  DC  20332-6448 

Dear  Arje: 

I  am  sending  you  attached  three  copies  of  another  paper,  for  which  support  by  both 
AFOSR  870252  and  AFOSR  890158  was  acknowledged: 

Maneuver  Prediction  In  Air  combat  Via  Artificial  Neural  Networks, 

by  myself  and  S.  M.  Amin. 

With  best  regards. 

Sincerely  yours. 


Ervin  Y.  Rodin 
Professor  and 
Director,  COSC 

enc.:  3  reprints 


cc.  Dr.  Neal  Classman 


Washington  I  niversin.- 
Cjmpus  Hox  KMO 
St.  liHiis,  .Mi.ssouri  h3139-HHW 
Tel:  ( 3 1  -t )  H«9  f)00',  -3WK) 

F.VX:  (31'. )  "26-4434 


\\.\sHlNGTO\  I  MX  ERS^TyIn^tTOI  1S 


( -cn!i'.'  !i  ■!■  ( 'ipiiiiiv.iiHMi  ..]kI  'i-injiuu  La  miri  'I 


Wednesday,  August  26, 1902 


Dr.  Arje  Nachman 
AFOSR/NM 
Bldg.  410,  BolUng  AFB 
Washington,  DC  20332-9448 

Dear  Arje: 

I  am  sending  yea  attached  three  copies  of  a  paper,  for  which  support  by  AFOSR  890158 
was  acknowledged: 


Adjacency  of  the  0-1  Knapsack  Problem, 


by  D.  Geist  and  myself. 
With  best  regards, 

i 

I 

Sincerely!  yours. 


Ervin  Y.  Rodin 
Professor  and 
Director,  COSC 

enc.:  3  reprints 


cc.  Dr.  Neal  Glassman 


Viasliinuton  fniversity 
Cinipus  H<ix  lO-tO 
St.  Umi.s,  .Mi.s.souri  (lAl.W  -iHW 
U-I.  (31-4)  8H9-W>r.  -SH(X) 

KC\  (3^^)'26-^•^3^ 


MllMli^ii^i^iriii»iiyl'>l 


iJ" 


i 


!MJH|l|iiiiili4J]!l  iti!|l  yWi|,|il..-  ^'l''W,'}!a 


■^  ■  Washington 

w  \>i  iiv  ;iiaT  \j\  f-iwtv  1,01  is 


V  v:  ’O'  r.  :  ‘  ip’iiJii/.iln  'll  .ti  ul  ’^oiiM'  'n  <  ■  ,  k  :  >  I 


Tuesday,  September  8, 1992 


Dr,  Neal  Classman 
AFOSR/NM 
Bldg.  410,  Bolling  AFB 
Washington,  DC  20332-6448 

Dear  Neal; 

My  apologies  for  responding  to  your  request  late;  however,  1  was  out  of  town  when  the  mess.igcs  arrived. 
So  now  here  is  the  information  for  the  period  requested: 

Publications: 

"On  Differential  Games  With  Neural  Networks"  with  Y.  Wu),  AFOSR  Workshop  of  Theory  and 
Applications  of  Nonlinear  Control,  St.  Louis,  MO,  1991 . 

"Character  Recognition:  Qualitative  Reasoning  and  Neural  Networks"  (with  Y.  Wu  and  S.  M.  Amin), 
Math,  and  Comp.  Modelling,  Vol  16,  No.  2,  pp.  95-104, 1992. 
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ABSTRACT 

ARTIFICIAL  INTELLIGENCE  METHODOLOGIES  FOR 
AEROSPACE  AND  OTHER  CONTROL  SYSTEMS 


Artificial  intelligence  methodologies  have  been  applied  to  the  modeling  and 
implementation  of  control  systems  and  differential  games  problems.  To  be  more  specific, 
artificial  neural  networks,  a  multiple  instruction  multiple  data  parallel  processor  tuned  by 
connection  weights,  are  used  to  model  a  control  system  or  used  as  an  identifier/controller 
which  functions  as  a  mapping  between  two  information  domains.  Based  on  a  new 
paradigm  of  neural  networks  consisting  of  Neurons  With  Local  Memory  (NLMs),  the 
representation  of  a  control  system  by  neural  networks  is  discussed.  Using  this 
representation,  the  basic  issues  of  complete  controllability  and  observability  for  the 
system  are  addressed.  A  separation  principle  of  learning  and  control  is  presented  for 
Networks  with  NLMs  (NNLM).  The  result  shows  that  the  weights  of  the  network  will 
not  affect  its  dynamics.  The  principle  may  be  utilized  to  prespecify  tl.5  steady  state 
properties  of  the  system.  Modeled  by  NNLM,  the  resulting  system  is  a  typical  nonlinear 
one  which,  through  rigorous  mathematical  analysis,  is  shown  to  be  locally  linearizable 
via  a  regular  static  state  feedback  and  a  nonlinear  coordinate  transformation. 

Significant  advances  have  been  achieved  in  applying  differential  gam?s  theory,  a  theory 
dealing  with  most  of  conflicts  in  daily  life,  economics,  military  affairs,  etc.,  to  practical 
problems.  In  this  dissertation,  this  theory  has  been  thoroughly  addressed  from  a  new 
point  of  view.  A  configuration,  based  on  the  paradigm  of  semantic  control,  is  proposed, 
which  can  be  used  to  derive  two  paradigms  of  differential  games  with  neural  networks. 
Generally,  two  neural  networks  are  used  in  each  of  these  two  paradigms.  One  network  is 
called  the  neural-identifier  and  it  is  used  to  identify  the  control  strategy  of  one's  opponent. 
The  other  one  is  the  neural-controller  which,  taking  the  estimate  of  the  control  strategy  of 
one's  opponent,  outputs  the  control  value  for  oneself.  The  issue  of  existence  of  solutions 
is  discussed.  To  demonstrate  the  effectiveness  of  the  method,  a  simulation  experiment 
was  carried  out  and  studied  for  a  pursuit-evasion  game  problem. 


In  Chapter  3  a  learning  control  algorithm  is  developed.  The  algorithm  can  be  used  to 
evaluate  the  weight  of  a  neural  controller  in  the  paradigms  proposed  in  the  chapter  or  in 
the  control  systems.  Using  the  learning  control  algorithm,  we  study  the  aircraft  control 
problem  in  the  presence  of  wind  shear. 

In  Chapter  4  we  shall  discuss  another  aspect  of  artificial  intelligence  techniques  in  control 
systems:  rule-based  system  in  a  class  of  pursuit-evasion  game  problems.  The  pursuit- 
evasion  game  problems  can  be  converted  to  classical  optimal  control  problems.  The 
optimal  control  solution  is  obtained.  The  solution  offers  several  advantages  such  as 
significant  time-saving  in  implementation.  Further  research  directions  are  addressed  in 
the  last  chapter. 
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ARTIFICIAL  INTELLIGENCE 
METHODOLOGIES  FOR 
AEROSPACE  AND  OTHER 
CONTROL  SYSTEMS 

1.  Introduction 


Much  effort  has  been  directed  during  the  past  two  decades  in  attempting  a  merger 
of  the  areas  of  Artificial  Intelligence  and  Automatic  Control  [4,  24,  56,  79).  Such 
a  merger  would  combine  the  rigorous,  precise,  and  analytical  foundation  of  auto¬ 
matic  control  theory  with  the  heuristic,  qualitative  and  efficient  reasoning  aspects 
of  artificial  intelligence.  Such  a  merger  would  provide  a  practical,  powerful  mech¬ 
anism  and  framework  and  effective  computational  tools  for  the  modeling  and 
analysis  of  fuzzy,  time-dependent,  noisy  and  uncertain  models  descril)ing  pliysical 
jjhenomena.  The  theory  and  apj)lications  of  sncli  a  merger  does  show  the  power 
of  the  efforts  in  this  area  for  modeling  various  real  processes  [25.  76.  77.  7lt].  This 
study  attempts  to  combine  the  tcchniriues  of  these  two  areas  to  develop  new  tools, 
to  enhance  the  analysis  of  a  mature  control  theory,  and  to  apply  these  techni(|ues 
to  real-time  processes.  We  begin  in  the  next  section  to  discuss  some  basic  issues 
concerning  the  combination  of  artificial  intelligence  and  automatic  control. 


1.1.  Intelligent  Control 


In  this  section,  we  shall  summarize  the  history,  research  efforts,  and  application 
aspects  of  intelligent  control.  In  particular,  we  shall  explore  its  relationship  with 
adaptive  control,  semantic  control  [76],  more  closely  related  expert  control  [6], 
and  knowledge-based  control  systems  [87|. 


.■\mong  others.  Saridis  [79]  gave  a  formal  definition  for  what  he  termed  an 
Intelligent  Machine: 

Definition  1.1  Intelligent  Machines  are  machines  that  are  designed  to  perform 
anthropomorphic  tasks  with  minimum  interaction  with  a  human  operator, 


Intelligent  control  then  is  the  function  that  drives  an  intelligent  machine.  In¬ 


telligent  control  can  also  be  considered  as  a  fusion  between  mathematical  and 
linguistic  methods  and  algorithms  applied  to  systems  and  processes.  Intelligent 
control,  which  is  hierarchically  distributed,  is  composed  of  three  basic  levels  of 
control:  the  organization  level,  the  coordination  level,  and  the  e.xccuiiju  level 
(see  Figure  l.l). 
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Figure  1.1:  Hierarchical  Intelligent  Control  System 
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Tho  organization  lovel  is  <losi"in’<l  to  porform  sncli  operations  as  roceivins  and 
reasoning  with  roniniands.  planning,  making  high  level  decisions  front  lotig-'erin 
memories,  providing  feedback,  and  e.xchanging  long-term memory.  Probabilistic 
models  are  provided  for  a  mechanistn  so  that  it  can  select  att  appropriate  task  for 
a  given  commatKi.  lit*'  concepts  of  commands,  task  commands,  events,  activities, 
ratidom  variables  associated  with  events.  an<i  functions  are  then  introrluceil  to 
'P('cifv  analytically  l  he  functions  of  the  »)rgatiizer.  1  his  lt'\'el.  which  is  di'signed  to 
imitate  functions  of  humati  behavior,  may  be  treated  as  an  element  of  knowledge- 
based  systems.  Tints,  knowledge  representation,  knowledge  flow,  and  knowledge 
processing  and  management  are  the  main  activities  on  this  level. 

The  coordination  level  is  an  interactive  structure  serving  as  an  interface  be¬ 
tween  the  organization  and  execution  level.  It  formulates  the  control  problems 
associated  with  the  most  probable,  complete  and  compatible  plan  formulated  in 
the  organization  levels.  Several  individual  coordinators  are  associated  with  spe¬ 
cific  hardware  execution  devices.  Each  of  them  performs  a  pre-specified  number 
of  different  functions.  Two  types  of  feedback  information  exist  for  each  coordi¬ 
nator:  [i]  off-line  feedback  informiation  which  is  fed  to  the  organization  level  and 
is  stored  in  long-term  memory:  and.  (ii]  on-line  or  real-time  feeilback  information 
which  is  issued  by  each  executor  in  the  execution  level,  received  by  the  coordina¬ 
tor  and  stored  in  short-term  memory.  I'he  on-line  feedback  in.formation  may  also 
be  used  by  other  coordinators  for  the  evaluation  of  the  ov«-ralI  accrued  cost  of  the 
coordinate  level. 

The  execution  level  executes  the  appropriate  control  functions.  In  particular, 
optimal  control  theory  with  a  non-negative  functional  of  the  systems  states  or 


I 


with  an  entropy  H(u)  for  a  particular  control  action  u(x.  t)  Is  discussed  by  Saridis 
[79].  However,  various  control  schemes  may  be  employed  on  this  level. 

To  represent  the  uncertainty  which  may  be  present  on  each  of  these  levels. 
Saridis  introduced  the  concept  of  entropy,  a  probabilistic  measure  of  uncertainty. 
.All  levels  of  a  hierarchical  intelligent  control  are  measured  by  entropies  and  their 
rates.  With  the  introduction  of  entropy,  the  theory  of  hierarchically  intelligent 
controls  may  be  stated  as  the  following; 

The  theory  of  an  Intelligent  Machine  may  be  postulated  as  the  mathematical 
problem  of  finding  the  right  sequence  of  decisions  and  controls  for  a  system  struc¬ 
tured  according  to  the  principle  of  increasing  precision  with  decreasing  intelligence 
(constraint)  such  that  it  minimizes  its  total  entropy. 

Although  Saridis  is  the  first  one  who  has  worked  on  Intelligent  Control  Theory 
in  a  systematic  way,  and  has  attempted  to  to  lay  a  mathematical  foundation  for 
the  theory,  other  people  have  also  actively  worked  on  this  area.  Among  these 

O 

people  are  Astrom  [.5],  Fu  [24],  Wiener  [99]  and  .Meystel  [.59.  60,  61,  62].  In 

0 

[•5],  Astrom  discussed  the  issues  of  intelligent  control  from  a  more  practical  and 
application-oriented  point  of  view.  Primarily  aiming  at  PID  Controllers,  .Auto¬ 
matic  Tuning  (e.g..  relay  autotuner).  .Adaptive  Control,  and  Expert  Control,  he 
reviewed  briefly  the  history  of  automatic  control.  He  explored  the  realistic  issues 
of  practical  real-time  processes,  such  as  sampling  period,  model  structure,  uncer¬ 
tainty,  disturbance,  and,  in  favor  of  PID  controller  and  self-tuning  regulator,  he 
discussed  the  ideas  and  application  areas  of  Automatic  Tuners  and  Adaptive  Con¬ 
trollers.  Unlike  Saridis,  who  combined  the  techniques  of  AI,  Operational  Research 

and  conventional  Control  Theory  and  treated  the  issues  of  intelligent  control  in  a 

0 

more  analytical  and  systematic  way,  .Astrom  mainly  discussed  the  problems  of  an 
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automatic  tuner  and  adaptive  controller,  implying  that  the  virtual  part  of  intel¬ 
ligent  control  lies  in  the  system's  capacity  to  adapt  to  a  time- varying/unknown 
environment,  be  convenient  to  end-users,  have  an  automation  of  regulator  param¬ 
eter  tuning,  and  assume  less  prior  information  for  the  system  to  be  controlled. 

O 

.■\ithough  a  common  point  can  be  observed  for  both  Astrom  and  Saridis  - 

O 

adaptation  to  unknown/time-varying  environments  -  Astrom  emphasizes  incor¬ 
porating  more  human  intelligence  into  the  process  controllers  as  in  the  position 
of  an  instrument  engineer  while  Saridis  views  the  intelligent  control  as  an  overall 

structure  of  the  whole  organization  as  in  the  position  of  a  Chief  Executive  Officer. 

0 

In  this  sense,  the  category  that  Astrom  discussed  as  intelligent  control  falls  into 

0 

what  Saridis  termed  '‘Executive  Les-eP;  however,  Astrom  treated  various  issues 
in  a  more  detailed,  practical  and  realistic  way. 

It  is  interesting  to  know  that  there  exists  another  type  of  control  system  ca¬ 
pable  of  intelligence:  an  Expert  Control  System  (see  Figure  1.2).  An  intelligent 

control  system  with  the  function  of  supervision  and  containing  a  knowledge-base, 

0 

is  categorized  by  .Astrom  [5.  6]  as  an  Expert  Control  System.  The  object  of  expert 
control  is  to  encode  knowledge  representation  and  decision  capabilities  to  allow 
for  automatic  intelligent  decisions  and  recommendations  rather  than  by  prepro¬ 
grammed  logic.  The  development  of  an  expert  control  system  is  motivated  by 
the  fact  that  heuristics  plays  an  important  role  in  PID  regulators.  Thus,  a  more 
efficient,  robust,  yet  cruder  way  of  implementing  heuristics  may  be  needed.  De¬ 
signing  an  expert  control  system,  which  has  the  capacity  to  orchestrate  a  range 
of  different  control  algorithms  for  different  control  goals,  seems  to  be  the  right 
answ’er.  Analogous  to  an  Expert  System  in  the  field  of  Artificial  Intelligence,  an 


expert  control  system  consists  of  the  system  data  base,  the  nilcba.se,  the  inference 
engine,  the  user  interface,  and  the  planning  process. 


Figure  1.2;  Block  Diagram  of  .An  Expert  Control  Systems 

In  an  expert  control  system,  the  system  data  base  contains  constraints  on  oper¬ 
ational  sequencing;  facts  (or  static  data  such  as  sensor  measurements,  tolerances, 
operating  thresholds,  etc.);  evidence  (or  dynamic  data  such  as  sensors,  instrument 
engineering  reports,  and  laboratory  and  test  reports);  hypotheses,  which  are  gen¬ 
erated  and  stored  in  the  data  bcise,  e.g.,  various  state  estimates;  and  goals  (either 
static  goals  or  dynamic  goals:  static  goals  include  the  wide  array  of  performance 
objectives:  dynamic  goals  are  those  established  on-line). 

The  rule-base  of  an  expert  control  system  contains  production  rules,  such  as 
if-then  rules.  The  conditions  of  the  rules  are  usually  facts  and  hypotheses  from 
the  data  base  while  the  results  of  the  rules  are  the  actions,  such  as  activation  of 
controllers.  The  rules  may  also  be  viewed  as  functions  operating  on  the  strategies. 
The  inference  engine  has  the  same  meaning  as  its  definition  in  the  traditional 
expert  system,  which  functions  according  to  different  strategies. 

.An  important  element  of  an  expert  control  system  is  planning.  In  view  of  the 
difference  between  the  conventional  control  systems  and  the  expert  control  sys¬ 
tems  which  deal  with  a  process  in  a  more  ambiguous,  more  qualitative  way,  the 


planning  process  of  an  expert  control  system  should  be  implemented  according  to 
this  difference.  V'arious  algorithms  are  provided  for  supervision,  analysis  and  sig¬ 
nal  generation.  .An  expert  control  sj-stem,  which  separates  the  control  algorithms 
from  the  logic,  derides  when  to  use  a  particular  algorithm.  The  planning  may 
be  viewed  as  an  action  of  search  in  a  logic  network,  forming  a  path  to  reach  the 
goals.  Its  function  involves  issuing  a  command  to  change  the  production  goals 

and  change  the  process  with  its  requirements. 

! 

1 

Comparison  of  an  expert  control  system  with  an  autotuner,  which  is  what 

o'  .  {  O 

Astrom  meant  by  an  intelligent  controller,  is  given  by  Astrom  in  [5].  .Although 
both  schemes  can  tune  the  parameters  for  a  conventional  controller,  e.g.  a  PID 
controller,  an  expert  control  system  usually  has  a  more  efficient  way  of  interacting 
with  a  human  operator  beciuse  of  its  supervision  functionality,  linguistic  interac¬ 
tion  capacity,  and  listing  capacity.  Thus,  depending  on  each  individual  application 
problem,  one  can  choose  aJ  appropriate  scheme  of  control  system  structure. 

Another  effort  at  combining  AI  techniques  and  control  theory  has  been  in 
developing  the  knowledge- based  nested  hierarchical  controller  [62]  for  the  analy¬ 
sis  and  design  of  autonomous  robots  [59].  The  structure  of  a  multi-resolutional 
(pyramidal)  nonhomogeneous  system  of  knowledge  representation  interacting  with 
a  planning/control  system  was  introduced  by  .\Ieystel.  A  structure  of  this  tyoe 
gives  not  only  unique  capabilities  of  knowledge  representation  but  also  a  number 
of  powerful  algorithmic  capabilities,  such  as  a  joint  planning/control  structure, 
planning  in  traversability  spaces,  minimum-time  dynamic  navigation,  knowledge- 
based  control,  and  others  which  are  promising  for  autonomous  intelligent  ma¬ 
chines.  This  type  of  controller,  which  employs  joint  multi-resolutional  planning- 
control  procedures,  algorithms  of  enhanced  nested  dynamic  programming,  the 
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hybrid  world  representation,  and  linguistic  clauses,  has  been  implemented  in  an 
intelligent  mobile  robot  IM.AS-2  [o9]. 

Knowledge  for  the  choice  of  control  is  represented  as  a  descriptive  structure  in 
a  fuzzy  linguistic  representation  space  (FLR-Space).  This  structure  is  obtained  in 
the  form  of  a  semantic  network  from  a  set  of  texts.  The  nodes  and  the  relations 
in  the  structure  can  be  evaluated  numerically.  Time  behavior  can  be  associated 
with  the  structure,  and  hence,  a  function  or  a  sequence  can  be 

considered  as  a  trajectory  of  motion  starting  with  the  initial  state  and  ending  at 
a  fixed  state.  Although  other  control  schemes  can  be  considered,  so  far  only  cost- 
optimal  control  processes  have  been  studied  in  [-59].  The  control  strategies  are 
obtained  via  a  sequence  of  Hamitonians  Hi  D  H2  D  ■  ■  •  D  H,  for  each  of  the  levels 
of  the  hierarchy.  The  algorithm  of  nested  dynamic  programming  provides  the 
major  mechanism  for  obtaining  the  control  strategies.  Nonhomogeneous  models 
which  are  not  in  the  form  of  a  system  of  algebraic  and/or  differential  equations  are 
used  for  various  reasons[59].  If  the  analytical  model  is  unknown,  one  can  usually 
organize  a  pseudo-analytical  model  using  tabulated  data.  It  seems  natural  to 
consider  the  use  of  production  systems  (PS)  for  matching  the  linguistic  nature  of 
the  original  world  description. 

Recent  works  by  Rodin  [76]  on  Semantic  Control  Theory  have  been  successful 
in  several  cases,  such  cis  [25.  77).  As  another  important  and  unique  approach  to 
combining  AI  techniques  and  control  theory.  Semantic  Control  theory  allows:  (1) 
the  system  to  adapt  to  varying/unknown  environments,  (2)  enhancing  human- 
machine  interaction,  and  (3)  for  on-line  planning/goal  selection.  A  semantic  con¬ 
trol  system  usually  consists  of  three  parts  (see  Figure  1.3):  [i]  Identifier;  [ii]  Goal 
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Selector;  and  [iii]  Adaptor.  Their  functions,  when  applied  to  a  situation  governed 
by  differential  games  (for  instance),  are  as  follows: 

(i)  Identifier:  The  Identifier  block  identifies  through  sensors  and  a  knowledge 

base  the  differential  game,  parameters,  targets  (if  any)  and  role  of  each 
player. 

(ii)  Goal  Selector:  The  Goal  Selector  solves  the  differential  game  chosen  by 
the  Identifier  block.  The  results  are  the  optimal  trajectories,  barriers  and 
controls. 

(iii)  .\daptor:  The  .Adaptor  determines  the  controls  that  cause  each  player  to 
"best”  follow  the  optimal  trajectory  determined  by  the  Goal  Selector. 
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Figure  1.3:  Semantic  Control  Paradigm 

.Applications  of  semantic  control  theory  in  problems  of  air-combat,  a  class  of 
pursuit-evaision  game,  can  be  found  in  (25,  77).  From  these  applications,  one 
can  see  that  the  theory  does  provide  a  powerful,  fundamental  framework  and 
mechanism  for  modeling  a  real  and  compie.x  system  as  well  as  provide  on-line 
adaptation  to  an  unknown  environment,  on-line  goal  selection  and  implementation 
of  lower-level  execution  functions. 
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From  above,  we  have  seen  that  several  intelligent  con*'  ■  schemes  have  been 

proposed  in  the  area  of  intelligent  control.  They  are  Intelligent  Control  by  Saridis 

0  0 
[79],  .Automatic  Tuning  by  Astrom  [.5],  E.xpert  Control  Systems  by  Astrorn  [6,  5], 

Knowledge-based  Neted  Hierarchical  Controller  by  Meystel  [6_]  and  Semantic 

Control  Theory  by  Rodin  [76].  These  schemes  are  proposed  from  different  points 

of  view  to  deal  with  many  complex  practical  systems.  They  have  been  successful 

in  a  variety  of  applications  [5,  25,  59,  77,  79].  A  common  point  of  these  schemes 

is  their  capability  of  adapting  to  the  changing  environment.  In  other  words,  they 

have  the  so-called  learning  capacity.  Thus,  it  is  nature  to  consider  the  mechanism 

to  realize  this  learning  capacity.  Neural  networks  for  control  systems  seem  to  be 

ideal  for  such  mechanisms.  In  the  next  section,  more  details  about  the  current 

research  efforts  in  this  area  will  be  given. 

1.2.  Neural  Networks  for  Control  Systems 

In  this  section,  a  brief  review  and  survey  is  given  concerning  current  works  and 
expected  future  research  trends  in  the  area  of  neural  networks  for  control  sys¬ 
tems.  The  topics  to  be  discussed  include  the  most  recent  works  in  this  area, 
different  types  of  neural  controllers  and  various  applications  of  these  controllers. 
They  are  [i]  Learning  Controllers,  [ii]  Recurrent  .Neural  Networks  for  Control  Sys¬ 
tems,  [iii]  Reinforcement  Learning  Controllers,  [iv]  Relationship  Between  Adaptive 
Controllers  and  Neural  Controllers,  [v]  .Modeling  and  Identification,  and  [vi]  Cere¬ 
bellum  Model  Articulation  Controllers.  Most  of  the  works  focus  on  applications 
of  neural  networks  in  known/unknown  nonlinear  systems  with/without  noise,  for 
the  purpose  of  either  control  or  identification.  Although  this  discussion  is  far  from 
complete  in  covering  ail  aspects  of  the  work  in  this  area,  it  does  indeed  include 
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the  major  trends  at  the  current  time.  In  what  follows,  we  shall  discuss  diH'erent 
topics  separately. 


Learning  Controller 

Most  recently,  Hoskins  et  al  [3G]  presented  an  iterative  constrained  inversion  tech- 
nirpie  to  find  the  control  inputs  to  a  plant.  Although  the  nature  of  their  work 
is  similar  to  the  work  in  [26],  several  advantages  are  observed  in  [.36].  First,  the 
proposed  controller  responds  on-line  to  changes  in  the  plant  dynamics.  More  in¬ 
terestingly,  the  proposed  controller  is  applied  to  generate  a  neural-network-based 
model  reference  adaptive  controller  (NN-MRAC),  which  is  used  to  control  the 
spring-mass-damper  system  in  which  the  position  respon'-  reference  com¬ 

mand  is  the  same  as  a  target  controller.  Second,  by  removing  i  ural  network 
from  the  direct  feedback  path  and  replacing  direct  feedback  with  an  estimate  and 
optimization,  Hoskins  is  also  the  first  to  attempt  to  consider  the  analytical  treat¬ 
ment  of  the  .stability  of  the  closed-loop  system,  which  is  important  but  has  no 
mature  solution  in  the  current  literature.  Third,  he  also  considered  the  issue  of 
“Smooth  Control”.  “Smooth  control”  is  generally  required  in  some  applications. 
That  is,  the  control  value  computed  at  the  current  step  should  r^ot  vary  too  much 
from  the  control  value  at  a  previous  step.  This  is  particularly  true  in  robot  ccnitro] 
problems.  For  the  redundant  robot  control  problem,  one  re(|uii(ineiit  is  to  avoid 
abrupt  changes  of  the  gesture  in  respon.se  to  the  slow  end-elfect  movement  of  the 
arm.  This  requirement  is  not  satisfied  in  previous  works  applying  neural  networks 
for  the  inverse  kinematics  problems.  Although  a  two-stage  learning  strategy  may 
be  an  answ'er  to  this  problem,  the  works  by  Hoskins  and  his  coworkers  did  show 
an  advantage  in  this  regard. 
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Reinforcement  Learning 

Reinforcement  learning  is  one  of  the  major  neural  network  approaches  to  learning 
control  [43].  .Although  these  methods  originated  from  studies  of  animal  learning 
and  in  early  learning  control  works  [5S],  they  have  now  been  an  active  area  of 
research  in  neural  networks  and  machine  learning.  In  [43],  Sutton,  Barto  and 
Williams  explained  these  methods  as  a  synthesis  of  dynamic  programming  and 
stochastic  approximation  methods  and  focused  their  discussion  on  the  Q-learning 
method  which  was  originally  presented  in  [94]  by  Watkins.  An  active-critic  learn¬ 
ing  system  contains  two  distinct  subsystems:  one  to  estimate  the  long-term  utility 
for  each  state  and  another  to  learn  to  choose  the  optimal  action  in  each  state. 
.A  Q-learning  system  maintains  estimates  of  utilities  for  all  state-action  pairs  and 
makes  use  of  those  estimates  to  select  actions.  They  viewed  these  methods  as 
an  example  of  a  direct  adaptive  optimal  control  algorithm,  i.e.  as  an  on-line 
Dynamic  Programming  method  and  a  computationally  inexpensive  approach  to 
direct  adaptive  optimal  control,  which  determines  the  control  without  first  form¬ 
ing  a  system  model. 

Recurrent  Neural  Networks  for  Control  Systems 

Also  recently,  Nikolaou  tt  al  [73.  74]  published  their  research  for  identifying  and 
modeling  a  chemical  process.  In  their  work,  a  recurrent  neural  network  consisting 
of  dynamic  neurons  whose  behavior  is  governed  by  the  following  set  of  differential 
equations 

£i  + 

T<  Ti  ^  Ti' 
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where  i  =  I,  2.  n,  is  used  to  model  chemical  processes  with  severe  nonlinearity. 
After  the  network  has  been  said  to  approximate  the  process  well  enough,  a  non¬ 
linear  controller  based  on  the  works  of  Isidori  [38]  is  used  to  control  the  chemical 
process  so  that  the  resulting  system  is  linear,  and  thus,  various  synthesis  methods 
for  linear  systems  can  be  used  for  the  purpose  of  control.  Their  approach  has 
been  shown  to  be  successful  by  applying  the  network  to  a  model  and  identifying 
a  continuously  stirred  reactor  (CSTR). 

Although  their  work  still  falls  into  the  category  of  identifying  and  modeling 
a  system  (process)  utilizing  the  interpolation  property  of  a  neural  network,  one 
of  the  unique  features  of  their  work  lies  in  using  the  internal  information  of  the 
neural  networks,  namely,  using  the  states  of  neurons  to  construct  the  nonlinear 
controller.  This  approach  reveals  a  new  aspect  of  the  work  in  the  area  for  neural 
networks  for  control:  how  to  efficiently  and  effectively  make  use  of  the  intelligence 
of  the  neural  networks  themselves  for  control  systems  or  how  to  utilize  the  internal 
information  of  the  network,  instead  of  viewing  the  network  as  generic  mapping,  so 
that  the  memory  capacity  and  learning  capacity  of  neural  networks  can  be  more 
fully  utilized. 

[t  turns  out  that,  in  the  current  literature  on  neural  networks  for  control  sys¬ 
tems,  very  few  people  put  an  emphasis  on  this  point  of  view.  .A  tremendous 
amount  of  work  has  been  done  using  feedforward  neural  networks  as  generic  map¬ 
pings,  and  then  demonstrating  that  such  a  mapping,  now  replaced  by  the  fancier 
name  “neural  networks”,  worked  fine  for  some  particular  problems  [8,  20,  70]. 
Typical  work  has  been  in  the  inverse  kinematics  problems  [30,  65].  If  the  plant  is 
known  a  priori,  teaching  the  inverse  dynamics  of  a  plant  to  a  feedforward  neural 
network  appears  easier  since  the  input-output  behavior  of  the  plant  can  be  utilized 
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as  teaching  signals.  However,  in  contrast  to  the  inverse  kinematics  problem,  in 
most  of  constrained  control  applications,  the  functional  expression  of  the  forward 
mapping  <t>  is  generally  unknown.  In  that  case,  a  two-stage  learning  strategy  has 
been  proposed  in  [21.  42.  46.  70].  In  the  two-stage  learning  strategy,  two  neural 
networks  are  used.  One  is  trained  to  learn  the  forward  mapping  or  dynamics  of  a 
nonlinear  system.  The  other  one  is  trained  as  a  neural  controller.  The  approach 
seems  promising  for  this  kind  of  problem. 

Therefore,  the  works  by  Nikolaou  ti  al  [73,  74]  have  offered  a  unique  approach 
in  this  direction. 

Relationship  between  Adaptive  Controllers  and  Neural  Controllers 
There  is  a  common  point  between  well-developed  Adaptive  Controllers  and  Neu¬ 
ral  Controllers:  adjusting  their  parameters  cn-line  or  recussively.  Thus,  in  many 
C2ises,  neural  controllers  are  very  clo.sely  related  to  adaptive  controllers.  Due  to 
this  reason,  it  is  natural  to  consider  neural  controllers  and  adaptive  controllers  to¬ 
gether  and  explore  their  relationship.  Among  researchers  working  in  this  particu¬ 
lar  area  are  Narendra[68,  69],  Hoskins  [36],  Chen  [14],  Karakasoglu  [45],  Guha  [31], 
Bialasiewicz  [11],  and  Sztipanovits  [S9].  Narendra  explored  how  well-established 
adaptive  identification  and  control  techniques  can  be  applied  to  the  analysis  and 
synthesis  of  dynamic  systems,  which  contain  neural  networks  as  subsystems.  Dif¬ 
ferent  combinations  of  neural  networks  and  linear  systems  are  considered  as  mod¬ 
els  for  identification  and  adaptive  control.  Detailed  analysis  and  discussion  about 
those  issues  are  given  by  him  and  Parthasarathy  in  [69].  Chen  [14]  used  a  dif¬ 
ferent  approach  to  neural  networks  for  self-tuning  control  systems.  Two  neural 
networks  are  used  for  approximating  the  nonlinear  terms  of  a  N  ARM  AX  model. 
The  weights  were  adjusted  such  that  the  error  between  the  output  of  the  actual 
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plant  and  the  output  of  the  neural  network,  and  the  error  of  the  output  signal  of 
the  plant  and  the  predefined  signal  tend  to  be  minimized. 

Unlike  a  self-tuning  control  scheme  which  usually  requires  a  priori  information 
such  as  process  model  order,  deadtime,  and  disturbance  characteristics  as  well  as 
the  assumption  of  linearity  of  process,  a  neural  controller  has  the  advantage  that 
it  usually  does  not  require  a  pn'ornnformation  about  the  process  to  be  controlled. 
Comparisons  have  been  made  between  the  two  schemes  in  [14,  4S,  50.  !)!].  and 
attempts  have  been  made  to  combine  techniques  in  these  two  areas  [36,  48]. 

Modeling  and  Identification 

There  are  many  neural  networks  applications  for  modeling  and  identifying  non¬ 
linear  systems  [1,  15,  28.  51,  74,  85,  86).  Most  of  the  work  on  neural  networks  for 
identification  and  modeling  has  been  in  using  the  property  of  universal  approx¬ 
imation  of  feedforward  networks  (e.g.,  [16,  35,  88,  98]).  A  typical  scheme  is  to 
use  the  error  between  the  output  of  the  network  and  the  output  of  the  unknown 
system  to  update  the  connection  weights  of  the  network  at  each  step.  Various 
optimization  methods  may  be  employed  to  reduce  the  output  error  by  adjusting 
the  interconnection  weights.  .Among  them  are  the  gradient  descent  algorithm, 
the  conjugate  gradient  algorithm  and  Davidson’s  algorithm.  Depending  on  how 
the  weights  are  updated,  there  are  two  different  schemes  for  training  the  neural 
networks:  Pattern  Learning  [78.  97]  and  Batch  Learning  [.32,  96].  Pattern  learn¬ 
ing  is  the  method  in  which  the  weights  of  the  network  are  adapted  immediately 
after  each  pattern  is  fed  in.  The  other  method,  however,  takes  all  the  data  as 

a  whole  batch,  and  the  netw’ork  is  not  updated  until  the  entire  batch  of  data  is 

* 

processed.  Qin  et  al  [75]  discussed  the  relationship  between  Pattern  Learning  and 
Batch  Learning  for  dynamic  system  identification.  Four  basic  learning  methods 
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have  been  used  for  their  work,  depending  on  the  schemes  of  tne  system  identifi¬ 
cation  using  neural  networks.  In  [69],  .N’arendra  and  Part’nasaratliy  discussed  the 
use  of  neural  networks  for  dynamical  system  identification  and  control.  Gener¬ 
alized  Neural  Networks  have  been  proposed,  which  are  various  combinations  of 
linear  dynamic  systems  and  feedforward  networks.  Chen  et  nl  [15]  have  developed 
a  prediction  error  algorithm  for  system  identification,  in  which  the  networks  are 
primarily  used  as  universal  approximations  for  nonlinear  systems. 

A  unique  approach  has  been  employed  by  Specht  in  [So].  A  one-pass  neural 
network  learning  algorithm  similar  to  [S4]  has  been  used  '  estimate  continu¬ 
ous  variables.  Depending  on  the  variables  used,  the  networks  can  be  utilized  for 
prediction,  modeling,  mapping,  and  interpolation,  or  as  a  controller.  Specht  dis¬ 
cussed  the  memory-based  network  that  provides  estimates  of  continuous  variables 
and  converges  to  the  underlying  (linear  or  nonlinear)  regression  surface.  This  net¬ 
work,  called  General  Regression  Neural  Network  (GRNN),  is  a  one-pass  learning 
algorithm  with  a  highly  parallel  structure.  Thus,  the  network  features  fast  learn¬ 
ing  that  does  not  require  an  ite^'ative  procedure  and  a  highly  parallel  structure. 
Among  the  advantages  of  GRNN,  the  network  "learns"  in  one  pass  through  the 
data  and  can  generalize  from  examples  as  soon  as  they  are  stored. 

.Although  most  of  the  work  in  this  direction  is  ba.sed  on  the  property  of  univer¬ 
sal  approximation  of  feedforward  neural  networks,  several  specific  neural  network 
architectures  have  been  used;  [i]  Feedforward  .Neural  Networks  (e.g.,  [1,  15,  51]); 
[ii]  GRNN  [S5];  and  [iii]  Recurrent  Neural  Networks  (e.g.,  [74]).  Different  archi¬ 
tectures  of  neural  networks  find  their  use  for  various  purposes  of  applications.  For 
example,  the  neural  networks  proposed  by  .Nikolaou  et  al  have  the  advantage  that 


the  internal  variables  can  be  readily  used  for  constructing  a  linearizing  controller 
such  that  the  o\erall  system  is  linearized. 

Likewise,  the  neural  network  approach  for  identifying  and  modeling  nonlinear 
systems  has  the  advantage  that  no  a  pr/on  information  about  the  model  structure 
is  needed.  Important  works  for  modeling  and  identification  using  neural  networks 
can  be  also  found  in  [03.  07]. 

Cerebellum  Model  Articulation  Controller  (CMAC) 

Another  type  of  neural  network  for  control  systems  which  is  w’orthy  of  mention¬ 
ing  is  the  so-called  Cerebellum  .Moflel  Articulation  Controllers  (CMACs)  [67|. 
It  was  invented  in  1075  by  James  .-Mbits  [2],  then  with  the  National  Bureau  of 
Standards.  Albus’s  scheme  was  based  on  a  model  of  human  memory  and  human 
neuromuscular-control  principles.  The  term  Cerebellum  Model  Articulation  Con¬ 
troller.  or  CMAC,  is  often  interpreted  to  mean  Cerebellar  Arithmetic  Computer. 
CM.ACs  were  originally  developed  for  robot  control,  and  they  have  been  popular¬ 
ized  by  a  group  at  the  robotics  laboratory  of  electrical  and  computer  engineering 
at  the  University  of  New  Hampshire  under  the  direction  of  W.  Thomas  Miller  III. 

CM.XCs  enjoy  the  reputation  of  having  a  much  faster  training  time  (several 
orders  of  .lagnitude)  than  Feedforward  Neural  .Networks  (FF.NNs)  trained  by 
backpropagation  [13],  yet  give  the  same  performance  as  FFN.Ns.  This  property  is 
particularly  useful  for  real-time  learning  and  control  problems,  e.g.  in  an  adap¬ 
tive  flight-control  system.  CM,\C  neural  networks  are  aiso  capable  of  effectively 
organizing  and  implementing  a  multi-dimensional  function  approximation  in  a 
computationally  efficient  manner  using  traditional  computing  architectures. 
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A  unique  approach  to  neural  network  controller  design  has  been  employed  by 
Kraft  and  Campagna  [50]  to  study  the  performance  of  this  type  of  controller  in 
a  nonlinear  system  corrupted  with  noise.  The  basic  idea  behind  their  work  is  to 
generate  an  approximation  to  a  characteristic  system  surface  from  input-output 
measurements  and  then  use  the  surface  as  feedforward  information  to  calculate 
the  appropriate  control  signal.  The  characteristic  system  surface  is.  in  fact,  the 
system  equation  representing  the  known/unknown  plant  to  be  controlled.  If  the 
values  of  the  system  parameters  were  known,  the  surface  could  be  precalculated 
and  stored  in  memory.  Then,  given  the  control  objective  (i.e.  the  desired  position 
in  memory),  it  would  be  possible  to  look  up  in  memory  the  correct  control  signal. 
When  the  system  parameters  are  unknown,  the  surface  must  be  “learned"  from 
input-output  data  in  real  time.  The  controller,  similar  to  the  work  of  Miller  [66], 
uses  a  memory  update  algorithm  which  updates  the  values  of  a  group  of  memory 
locations  near  a  selected  memory  cell  during  each  control  cycle,  using  the  concept 
of  generalization. 


Kraft  and  Campagna  [50]  compared  this  type  of  neural  network  controller 
with  two  traditional  adaptive  control  systems;  Self-tuning  Regulator  and  Model 
Reference  .Adaptive  Conirollers  (MR.ACs)  in  a  study  of  the  behavior  of  a  first- 
order  system  with/witho|it  nonlinearity,  presented  with/without  noise.  Results 
showed  that  the  CMAC  neural  controller  performed  equally  well  in  the  presence 
of  noise,  and  worked  extremely  well  for  a  nonlinear  system,  compared  with  the 
two  traditional  adaptive  controllers.  Although,  unlike  MR.ACs,  this  controller  has 
no  guarantee  for  stability  analysis,  implementation  speed  comparisons  favored  the 
neural  network  approach  because  the  control  signal  can  be  generated  virtually  as  a 
table  look-up  procedure.  Moreover,  with  the  neural  network  controller  approach, 
no  a  priori  information  about  the  system  to  be  controlled  is  needed.  Thus,  the 
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neural  network  controller  is  suitable  for  a  wide  class  of  nonlinear  systems.  Above 
all,  their  results  indeed  reveal  some  interesting  aspects  of  neural  network  approach 
for  control  systems. 

More  recently,  using  B-Spline  receptive  field  functions  in  conjunction  with 
more  genera;  CM  AC  weight  addressing.  Lane  et  al  [52]  developed  higher-order 
CMAC  neural  networks  that  can  learn  both  functions  and  function  derivatives. 
The  number  of  weights  addressed  in  computing  a  network  output  grow  expo¬ 
nentially  with  the  number  of  input  dimensions.  Back-propagation  BMACs  with 
higher-order  reception  field  functions  on  only  selected  network  inputs  and  Spline- 
Net  network  architectures  were  proposed  as  potential  solutions  to  problems  of 
more  modest  size,  producing  piecewise  linear  and  additive  function  approxima¬ 
tions. 

1,3.  Organization  of  the  Dissertation 

The  purpose  of  this  study  is  to  model  and  to  analyze  control  systems  aided  by 
neural  networks.  The  approaches  attempt  to  explore  use  of  the  features  of  parallel 
architecture  in  control  systems.  It  is  organized  into  five  chapters. 

The  second  chapter  is  a  study  in  modeling  control  systems  using  neural  net¬ 
works  which  have  a  highly  parallel  structure  and  are  capable  of  learning  and 
storing  information.  The  study  is  in  the  spirit  of  fully  utilizing  the  intelligence 
of  the  networks  and  the  pattern  of  processing  information  in  parallel  inside  the 
networks.  VVe  go  beyond  using  the  universal  approximation  property  of  neural 
networks,  and  also  consider  the  internal  state  information  of  the  recurrent  neural 
networks  so  that  a  control  system  can  be  modeled  using  this  highly  parallel  struc¬ 
ture  of  computation  mechanisni.  Based  on  a  new  paradigm  of  neural  networks 
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consisting  of  Neurons  With  Local  Memory  (N’LMs).  the  representation  of  a  control 
system  by  neural  networks  is  discussed.  Using  this  representation,  the  basic  issues 
of  complete  rontrollabilitv  and  observability  for  the  system  are  addressed.  A  sep¬ 
aration  principle  of  learning  and  control  is  presented  for  NN'LM.  The  result  shows 
that  the  weights  of  the  network  will  not  affect  its  dynamics.  The  principle  may 
be  utilized  to  prespecify  the  steady  state  properties  of  the  system.  Modeled  by 
.N’NL.M.  the  resulting  system  is  a  typical  nonlinear  one  that,  through  mathematical 
analysis,  can  be  shown  to  be  locally  linearizable  via  a  regular  static  feedback  and 
a  nonlinear  coordinate  transformation.  .Although  theorectical  results  in  Chapter 
2  are  not  directly  used  in  Chapter  3.  they  do  have  potential  applications  for  the 
differential  game  problems.  Tor  example,  pursuit-evasion  games  can  be  modeled 
by  NNLMs  while  controllers  can  then  be  designed  using  various  techniques. 

The  third  chapter  of  the  dissertation  is  to  develop  another  new  paradigm  and 
tools  for  applying  neural  network  techniques  in  traditional  differential  game  prob¬ 
lems.  During  the  past  few  years,  attempts  have  been  made  to  utilize  the  powerful 
qualitative  reasoning  and  heuristic  search  capacities  in  the  area  of  artificial  intelli¬ 
gence  to  overcome  the  difficulties  of  applying  differential  game  theory  in  practical 
problems,  such  as  cumbersome  computations  [77.  82.  95].  .\  configuration,  based 
on  the  paradigm  of  Semantic  Control,  is  proposed.  It  can  be  used  to  derive  two 
paradigms  of  differential  games  with  neural  networks.  Two  neural  networks  are 
used  in  each  of  these  two  settings.  One  network  is  called  the  neural-identifier 
which  is  used  to  identify  the  control  strategy  of  the  opposing  player.  The  other 
one  is  the  neural-controller  which,  taking  the  estimate  of  the  control  of  the  other 
player,  outputs  the  real  control  value  for  its  own  player.  The  issue  of  existence 
of  solution  is  discussed.  To  demonstrate  the  effectiveness  of  the  method,  a  sim¬ 
ulation  experiment  is  carried  out  and  studied  for  a  pursuit-evasion  problem.  In 
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this  chapter,  a  learning  control  algorithm  is  developed.  The  algorithm  can  be 
used  to  evaluated  the  weights  of  a  neural  controller  in  the  paradigms  proposed  in 
the  chapter  or  in  other  control  systems.  Using  the  learning  control  algorithm,  we 
study  the  aircraft  control  problem  in  the  presence  of  windshear. 

The  fourth  chapter  is  a  study  of  optimal  control  and  optimization  problems 
in  the  Layered  Defense  Project.  The  Layered  Defense  Project  is  a  cooperative 
effort  between  the  Center  for  Optimization  and  Semantic  Control  at  Washington 
University  and  the  Electronics  and  Space  Corporation.  Based  on  the  semantic 
control  theory,  the  project  is  to  model  and  study  a  class  of  pursuit-evasion  game 
problems.  The  third  part  of  this  dissertation  discusses  the  optimal  control  prob¬ 
lems  arising  from  the  project.  Classical  line-of-sight  coordinates  are  employed  to 
model  the  game  situation.  Based  on  a  similar  study  in  [18],  an  optimal  control 
law  was  derived  for  the  one-pursuer  and  one-evader  case.  A  non-derivative  opti¬ 
mization  method  is  used  for  finding  the  optimal  initial  costates  for  the  optimal 
control  law. 

The  fifth  chapter  summarizes  our  work.  The  main  contributions  of  this  disser¬ 
tation  are  enumerated  and  future  research  directions  are  presented  in  this  chapter. 
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2.  Neural  Networks  Approach  to  Control  Systems 


This  chapter  presents  new  approach  to  neural  networks  for  control  systems.  Based 
on  a  paradigm  of  neural  networks  -  NXLM  -  consisting  of  neurons  with  local 
memory,  a  control  system  represented  by  neural  networks  is  discussed.  With  this 
representation,  the  basic  issues  of  complete  controllability  and  observability  for 
the  system  are  addressed.  For  the  first  time,  a  separation  principle  of  learning  and 
control  is  presented  for  NNLM,  and  the  principle  shows  that  the  weights  of  the 
network  will  not  affect  its  dynamics.  Because  of  the  nonlinearity  of  the  network,  it 
is  natural  to  consider  the  issue  of  linearization  around  a  local  equilibrium  point. 
A  detailed  and  rigorous  analysis  of  the  local  linearization  via  a  regular  static 
feedback  and  a  nonlinear  coordinate  transformation  is  given  in  the  final  section. 

2.1.  Background 

This  beginning  section  will  briefly  recall  three  types  of  neurons  commonly  used  in 
feed-forward  and  recurrent  neural  networks.  They  are  .\IcCullocli-Pitts  neurons. 
Crossberg’s  neurons  and  Hopfield’s  neurons.  The  well-known  MeCnllwrh-I’itts 
neurons,  which  take  the  weighted  sum  of  inputs  ainl  give  the  output  through  a 
transfer  function,  are  the  basic  elements  in  feedforwarrl  neural  networks.  They 
have  been  widely  and  successfully  used,  and  their  structure  is  well  known.  Al¬ 
though  there  are  various  architectures  to  connect  the  neurons  (see  the  structures 
in  backpropagation  networks,  Kohonen  networks  and  Hopfield  networks),  the  ba¬ 
sic  elements  —  the  neurons  —  remain  the  same. 


When  he  studied  the  famous  dog-saliva*food  biological  phenomenon.  Cioss- 
berg  proposed  a  new  type  of  learning  rule,  known  as  the  Grossberg  Learning  Law, 
as  well  as  a  new  type  of  neuron  in  order  to  attempt  to  mathematically  formulate 
Hebb’s  law.  His  approach,  in  turn,  \ttempted  to  explain  the  classical  condition¬ 
ing  behaviors  discovered  by  Pavlov.  Si.nce  we  shall  not  discuss  the  learning  law  in 
detail,  interested  readers  are  referred  to  [47]. 

The  neurons  proposed  by  Grossberg  are  not  simply  of  the  McCulloch-Pitts 
type,  as  their  outputs  are  described  by  a  different  set  of  equations.  Consider  a 
neuron  which  has  a  number  of  inputs  coming  from  other  neurons  in  the  network, 
as  well  as  an  external  input  coming  from  outside  the  network.  The  following 
equation  describes  the  dynamics  of  the  ith  neuron 

=  -ayi{t)  + +  (2.1) 

where  is  the  output  of  the  ith  neuron,  /<(<)  is  an  external  input  to  the  ith 
neuron,  and  Wj  is  the  weight  connecting  the  output  of  the  ith  neuron  to  the  input 
of  some  other  neuron.  The  difference  between  the  McCulloch-Pitts  neurons  and 
those  proposed  by  Grossberg  is  clear  since  “dynamics”  are  incorporated  in  each 
of  the  Grossberg  neurons.  These  dynamics  are  represented  by  a  positive  constant 
a  which  controls  the  decay  of  the  output  in  the  absence  of  any  otlnu’  input.  Thus. 
(t  may  rdso  be  called  a  forgetting  factor.  This  type  of  neuron,  together  with 
Grossberg's  learning  law,  give  a  plausible  mathematical  formulation  for  llelib’s 
law  and  thus  form  a  satisfactory  connection  with  Hebb’s  learning  theories. 

Later  (in  1984),  John  Hopfield  proposed  a  general  structure  for  a  continuous 
deterministic  model.  This  structure  is  known  as  the  Hopfield  model.  A  Hopfield 
model  is  a  two-layer  network  in  which  the  neurons  in  the  hidden  layer  are  fully 
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connected  to  each  other.  The  input-oiitpnt  relationship  of  the  ith  neuron  in  the 
network,  realized  by  an  amplifier,  is  described  by  the  set  of  nonlinear  dynamic 
eciuations 


where  Ci  is  the  total  input  capacitance  of  the  amplifier.  T.j  is  the  strength  of  the 
connection  from  the  output  of  the  jth  amplifier  to  the  input  of  the  ith  amplifier, 
«,  is  the  input  to  the  ith  amplifier,  and  Vj  is  the  output  of  the  jth  amplifier.  .Also, 
r,  is  a  resistance  value.  I,  is  the  sigmoidal  transfer  function  of  the  ith  amplifier, 
and  gi  is  the  sigmoidal  transfer  function  of  the  ith  amplifier,  assuming  a  negligible 
response  time.  A  commonly  known  property  of  the  Hopfield  network  is  that 
the  state  of  the  network  can  be  attracted  to  an  equilibrium  point  corresponding 
to  a  local  minimum  of  the  energy  function  and  hence  the  network  can  be  used 
to  implement  a  content  addressable  memory.  Based  on  this  property,  Hopfield 
networks  have  been  used  satisfactorily  for  traveling  salesman  problems  [3-3,  34]. 
for  an  A/D  converter  [90],  signal  decomposition  [90],  linear  programming  [90], 
and  various  combinatorial  optimization  problems  [00]. 

.As  we  shall  see  in  subsequent  sections,  the  netirons  introduced  l)elow  are  dilfcr- 
ent  from  McCulloch-Pitts  neurons.  Grossl)erg  neurons  and  Hopfield  neurons.  In 
some  sense,  they  are  closest  to  the  neurons  in  the  Hopfield  model  as  thc\-  can  l)e 
viewed  as  a  discrete-time  version  of  the  neurons  in  the  Hopfield  model.  But  unlike 
those  in  the  Hopfield  model,  these  neurons  are  used  in  a  feedforward  network  in 
which  the  well-known  backpropagation  algorithm  can  be  employed  to  change  the 
weights.  More  importantly,  they  are  used  here  in  a  novel  attempt  to  represent 
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a  control  system  by  a  neural  network.  In  fact,  the  idea  of  representing  tiie  in¬ 
ternal  states  as  a  state  vector  is  not  new.  Hopfield  used  the  same  idea  when  he 
used  state  vector  to  construct  an  energy  function  in  his  network.  He  proved,  by 
using  Liapunov  stability  theory,  that  the  state  will  eventually  converge  to  a  local 
equilibrium  in  state  space,  which  corresponds  to  a  local  minimum  of  the  energy 
function.  The  neurons  proposed  below  are  used  for  representing  a  control  system 
and  are  the  basic  elements  for  a  feedforward  network  which,  unlike  the  lloiipK'ld 
network,  does  not  have  an  equilibrium. 

2.2.  Neurons  with  Local  Memory  (NLM) 

The  term,  Neurons  with  Local  Memory  (NLM),  comes  from  the  presence  of  dy¬ 
namics  inside  each  of  the  neurons  we  are  interested  in.  The  incorporatin  •  oi 
dynamics  inside  each  neuron  is  the  main  distinction  between  these  neurons  and 
the  conventional  McCulloch- Pitts  neurons.  As  we  shall  see  below,  this  type  of 
neural  network  facilitates  much  of  the  subsequent  analysis  of  neural  networks  for 
control  sj’stems.  The  incorporation  of  dynamics  in  each  neuron  results  in  the  (low 
of  outputs  from  neurons  even  without  any  inputs.  Thus,  the  NLMs  may  also  be 
termed  dynamical  neurons  or  active  neurons. 

Interestingly,  a  similar  idea  has  been  used  by  Nikolaou  ft  nl  in  [T^L  71]  to 
identify  the  dynamics  of  a  continuously  stirred  reactor  (CSTR).  In  their  work. 
Nikolaou  et  al  used  a  neural  network  who.se  neurons  have  the  following  set  of 
differential  equations 

dxj  _  Xj  FiiZjWjjXj)  u 

dt  Ti  Ti  T?  ^  ’ 

for  i  =  1,  2,  ...,  n.  Although  their  work  has  been  successful  in  identifying  the 
dynamics  of  CSTR,  they  have  not  discussed  basic  issues  of  a  control  system  such 
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as  controllability  and  observability.  In  this  study,  we  shall  discuss  the  basic  issues 
of  control  systems  associated  with  this  type  of  neural  networks  in  a  more  analytical 
and  systematic  way. 

A  typical  representation  of  an  input-output  relationship  for  the  conventional 
McCulloch-Pitts  neurons  is  written  as 

yi  =  . (2.1) 

where  the  t/s  and  u ’s  are  the  outputs  and  the  inputs  respectively.  .Also,  Z  is  the  set 
of  positive  integers,  the  subscript  k  denotes  the  time  step  k,  and  the  superscript 
j  denotes  the  jth  neuron.  A  typical  form  for  /•'  o^  (2.4)  can  be  written  as 

yi  -  (2-5) 

1=1 

where  sj  is  a  sigmoidal  function,  and  the  Wij's  are  the  synaptic  weights. 

A  basic  structure  of  an  NLM  is  shown  in  Figure  2.1,  where  j  denotes  the  jth 
neuron.  The  quantities  yl,ul'\ are  the  output  and  inputs  to  the  neuron 
at  time  step  k,  respectively.  Also.  denotes  the  backshift  operator  and  sj* 
denotes  the  inverse  of  the  transfer  function  for  the  neuron  j. 

The  output  i/l  of  an  NLM  can  be  written  as 

'A  =  +  (2.0) 

1=1 

where  is  a  scalar  whose  value  represents  the  dynamics  in  n<;uron  j.  d  is  another 
scalar,  and  the  wji's  are  the  weights  of  connection  from  other  neurons  to  neuron 
j.  By  setting  A  =  0  and  d  =  I,  we  immediately  obtain  the  conventional  input- 
output  relationship  for  the  McCulloch-Pitts  neurons.  It  follows  that  the  input- 
output  relationship  of  a  conventional  neuron  is  actually  a  special  case  of  that  of 
NLM. 
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Figure  2.1:  Basic  Structure  of  a  NLM 

An  alternative  and  more  informative  input-output  representation  of  an  NLM 
can  be  given  by  introducing  an  internal  state  variable  Xk 

i=l 

yl  =  Sjic^xi),  kez,  (2.7) 

from  which  (2.6)  can  be  derived  easily.  The  system  equation  (2.7)  is  called  the 
node  system.  Again  setting  a-'  =  0  and  —  1  in  (2.7),  we  obtain  the  input-output 
relationship  of  a  conventional  neuron. 

The  advantage  of  the  representation  (2.7)  over  the  representation  (2.6)  is 
apparent  by  introducing  the  internal  state  xj..  The  system  (2.7)  actiially  has  the 
standard  state  equation  and  output  equation  familiar  to  control  engineers.  For 
convenience,  we  still  adopt  the  same  name,  “state  equation",  for  the  x  equation. 
The  role  of  a-'  in  (2.7)  is  clear  from  the  familiar  control  theory.  For  example, 
a  necessary  condition  for  the  node  system  to  Le  asymptotically  stable  is  that 
the  a^'s  lie  inside  the  unit  disc  in  the  complex  plane.  Even  though  the  state 
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equation  in  (2.7)  is  linear  and  time-invariant,  the  output  equation  is  nonlinear, 
which  complicates  further  analysis.  Although  we  may  assume  that  is  linear, 
which  is  the  case  in  part  of  our  following  analysis,  we  shall  generally  consider  Sj 
to  be  nonlinear,  e.g.  a  commonly  used  sigmoidal  function. 

2.3.  Networks  with  NLMs  (NNLM) 

Having  defined  tlu;  basic  structure  for  XLM  in  the  previous  section,  we  can  now 
construct  a  neural  network  whose  elements  are  NLMs.  VVe  shal]  denote  the  NNLM 

I 

with  m  inputs,  n  hidden  nodes  and  p  outputs  by  Nm.n.p-  For  simplicity,  we  only 

I 

consider  the  single-input  and  single-output  (SISO)  system  in!  this  section.  The 
generalization  to  the  multi-input  and  multi-output  (MIMO)  system  is  straight¬ 
forward.  Meanwhile,  the  input  to  the  network  has  generally  arbitrary  values. 


Figure  2.2:  General  Structure  of  NNLM 
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A  general  structure  for  NXLM  is  shown  in  Figure  2.2  .  The  state  eciuations  are: 


node  0  : 


4 


+  Uk. 


node  1,  node  n-2 


Uk 


a'xUi+ti'uyt 

S2(c'xl.),  f=1.2 . n  —  2. 


node  n-1 


safe'-’x^  *), 


where  the  a'.s  are  scalars  representing  the  dynamics  of  the  ith  node  system,  the 
Sj's  are  the  transfer  functions,  which  are  generally  sigmoidal  functions,  and  the 
ti'ij's  are  the  synaptic  weights  for  the  path  connecting  adjacent  layers. 


Assuming  for  a  moment  that  the  transfer  functions  soi-sa  and  S3  are  all  linear 
and  defining  the  state- variable  vector  xj  by  xj  =  we  can  represent 

the  node  system  in  a  more  concise  form  by 


Xk  =  Axfc-i  -f  Biu- 


Vk 

=  Cxfr, 

a° 

0 

0 

0 

0 

u'iic°a° 

«' 

0 

0 

0 

WHn-2)C°a° 

«®a 
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C  =  0  0  •  •  •  0  c' 


,n-l 


a  =  wuil'2,c°c'. 


Equation  (-.S)  represents  a  linear  state  and  output  equation  with  the  transfi'r 
matrix  being  a  lower-triangular  one.  By  assigning  a‘(forQ  </</)  —  !)  in  .A.  we 

can  alter  the  dynamics  in  (2.8).  Assuming  that  a'  for  i  =  0 . n-2.  we 

define  the  quantity  Oc  as  follows 


n-2 


^  I 

rt"  1  —  a' 


(2.9) 


The  quantity  ac  plays  a  key  role  in  our  subsequent  discussions.  .-\s  mentioned  in 
the  begining  of  this  section,  Nm,n,p  denotes  the  NNLM  with  m  inputs,  n  hidden 
nodes  and  p  outputs.  Based  on  the  analysis  on  controllability  and  observability 
in  the  next  section,  we  immediately  have  the  following: 


Theorem  2.1  Suppose  that 

[i]  All  transfer  funefiens  Sj  are  linear. 

[ii]  ir,j  ^  0  for  all  i.j. 

[iii]  (■'  ~  0  for  all  i. 

[iv]  a  -  7^  0  and  a..  < 

[v]  a'  7^  (!''  for  i  j . 

Then,  any  strictly  proper  SISO  linear  system  with  real  and  nonre peatiny  eiyenral- 
ues  can  be  realized  by  Ni,n-2.i,  where  the  a's  are  the  eigenvalues  of  the  system. 

Proof:  Because  the  system  (2.8)  is  completely  controllable  and  observable  (see 
theorems  2.2  and  2.3  in  the  next  section),  the  transfer  fu.  ,>n  C(.s/  —  A)“*B 


:U 

has  no  j>olo-zi’ro  caiu'('lIation  Ix-twc-nn  its  iiunioralor  ami  <Icmimiiiator.  Sim  o  the 
onltT  of  the  (li'iiominator  polyiiomial  is  n.  it  n'pn'sonts  a  typical  ni  h-onlor  rat  ional 
tiansfer  function. 


\\V  give  an  oxainplc  for  the  case'  n=l.  The*  matrices  A  ami  B  in  tliis  case  are 


A  = 


C  =  [  0  0  0  c’ 
whore  a  =  ec^jc'u’nc®  +  ir^jc^tcijc'’  and  the  transfer  function  is 

+  6rS  +  bo) 


■  0° 

0 

0 

0  ■ 

‘  1  ■ 

WuC°(l'^ 

a* 

0 

0 

,B  = 

iei2C°a'^ 

0 

0 

ICilc'u* 

U'22C^(l^ 

h 

(s  —  a°)(s  -  «*)(«  —  <i^){s  —  a^)’ 


(2.10) 


md 


^'3 

I, 2 
bi 

bn 


=  (1. 


— (7(«”  +  u'  +  (i^)  +  <i^c^c^wi2ii'i2  +  o*c‘Vien 'Cji  +  n'^n. 


(</”  +  (/*  )i/*e'^c'  (C|  I  ii'2i-  —  ee^e/'c'h  'K'i  jiCj.  —  ti''ii~i'i. 

'  0  \  I)  \  :  n  :  ,  o  i  :  a  i 

~<ni  ii  II  +  It  II  II  r  (■  ee'i^noj  +  ee  'l  u  •'  »'  '('n'Cji 
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4/  a  *!*('  r'frija’jj. 


Thus,  by  properly  choosing  ti’n,  n’lj,  icji.  iiyj,  w«'  can  realize  a  1th  orde-r  lim'ar 
system.  A  block  diagram  for  the  realization  is  shown  in  Figure  2.3. 


Figure  2.3:  Linear  System  Representation  of  Order  4 


2.4.  Controllability  and  Observability 

The  basic  issues  of  controllability  and  ob.servability  for  the  .system  (2.S)  will  be 
discussed  in  this  section.  For  the  definitions  of  controllability  and  observability, 
interested  readers  may  refer  to  [44].  We  have  the  following: 

Theorem  2.2  Suppo.i^c  that 

Ci]  u’.j  7^  0  for  nil  i.j. 

[ii]  c'  ^  0  for  all  i, 

[iii]  ^  0  ami  a^  <  -x. 

Tiirn.  tlif  sijsl(iu  (J.S)  Is  rciupUlih)  rout rnllablr  If  ami  onlij  If  Ihf  follon  lmj  la- 
<(juallll(s  hold 

a‘  -fz  a-'  for  I  fz  j,  i,j  =  1,2 . n—  1.  (2.11) 


/ 


Proof:  (Sufficiency)  We  shall  prove  sufficiency  by  using  the  Popov- Belcvitch 
Hautus  rank  test  (see  [  l-l]).  Let  be  defined  as  follows 

r  s  —  a°  0  0  ...  0 


.4i  =  [sl-A  B]  = 


—Wi\c°aP  s  —  a* 
—  iri2C°a°  0 


-'t'i(n-2)cV  0  0 

O'  11  '^  2 

—  (ra 


iM  ^n  —  2,,r\—2 

—  “2(.i-.’)C  (I 


0  tt>i(„_2)C° 


Obviously  Ai  has  rank  n  if  s  is  not  an  eigenvalue  of  Ai.  For  s  =  a°  and  if  a°  ^  a‘ 
for  i  >  1,  multiplying  the  last  column  by  a°  and  adding  it  to  the  1st  column  yields 


■ 

0 

0 

0 

1 

0 

a°- 

0 

0 

U'uC° 

0 

0 

0 

0 

.  0 

-iL-nC 

>a' 

—  U’22C^a^ 

...  0°  -  a"-* 

a 

which  has  rank  n. 

For  and  if  =  «'  for  sonic  i.  deleting  the  nth  column  of  matrix  .-li 

yields  a  matrix  .In: 

0  0  0  ...  0  1  ■ 

— u-nc'’«”  —  fl*  0  ...  0  u'nc'^ 


0  a°  - 


-it’i(„_2)c“a°  0  0 

O'  11  22 

—(ra  —W2ira^  —1022^^  ••• 


aP  —  fi"  ^ 


a 


.4fter  elementary  transformations  are  performed  on  the  matrix  .I3,  its  last  row 
becomes  [0,  0,  ...  0,  *,  0,  ...,  Ol.  Then,  performing  another  series  of  elemcntnrv 


:U 


transformations  on  the  resulting  matrix  yields  the  following 


Ai  — 


a°  0 

0  a°  — 

0  ••• 

0 


0  0 


0 


0  0 


a°  -  a‘ 


a°  -  a‘+* 


0 

0 


a  —  a 


0  0  •••  0  *  0  •••  0 

which  has  rank  n. 

For  5  =  a'(l  <  i  <  n  —  2),  deleting  the  nth  column  of  ,4i  yields 


/is  = 


a‘  0 

0  a'  —  a' 


a'  — 


0 

0 


0  — a’2ic'a*  —W22C^(t^ 


0 

0 

0 


0  ' 
0 

a'  — 


0 

0 


(/■iic" 


"•i,e 


^1  +  1  /-/i+l 


a‘  -  a"-- 


-W2iC'a‘  -u’2(,+i)C‘^‘n‘ 

After  performing  a  series  of  elementary  transformations,  it  is  not  hard  to  show 


that  the  rank  of  the  matrix  is  again  n. 
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For  the  case  =  0""^  deleting  the  nth  column  of  .di  and  performing  one  ele¬ 
mentary  transformation  on  yields 


'•-*  0  0 
0  a"-i  -  0 


0  0  0 
0  — —lL'22C^a^ 


0  1 

0  »-nC° 


a"-‘  -  a"-2 


^'l{n-2)C 

a 


.0 


Again,  performing  another  series  of  elementary  transformations  on  Ag.  one  ol)tains 


■  a"-! 

0 

0 

0 

1 

0  a"- 

1  -a‘ 

0 

0 

u.-uc“ 

A7  = 

1 

i 

•  . 

0 

0 

0 

«'l(n-2)C° 

.  0 

0 

0 

0 

dc 

where  dc  is 

dc 

aeC°a"~*. 

Therefore,  At  has  also  rank 

n. 

(Necessity)  Necessity  is  proved  by  contradiction.  Letting  a'  =  a-'  for  some 
*  7^  i)  2,  n-1  yields  a  matrix  A*  which  has  rank  less  than  n. 

Q.K.D. 


REMARKS: 

It  is  easy  to  see  from  the  proof  that  the  condition  a°  =  a'(l  <  i  <  ;?  —  1 )  is  allowed. 
Thus  the  system  is  still  controllable  even  for  repeated  eigen\’alues  a°  =  a'  for  some 
i  between  1  and  n-1.  Notice  that  a„-i  ^  for  i  =  0,  ...  n-2  is  only  a  sufTicient 


;?(i 

condition  for  the  theorem.  The  following  example  shows  that  the  assumption  may 
not  be  necessary. 

Considering  a  case  where  n  =  2.  we  have  the  state  equations 


0 

H 

=  +  “fc, 

(2.13) 

4 

(2.14) 

!/l 

(2.1o) 

This  system  is  obviously  controllable  no  matter  what  the  values  of  a°  and  n'  are. 
Thus,  letting  a°  =  a'  =  constant,  we  still  have  a  controllable  system. 

A  similar  result  regarding  the  observability  of  the  system  is  obtained. 
Theorem  2.3  Suppose  that 

[i]  Wij  7^  0  for  all  i,j, 

[ii]  c'  ^  0  for  all  i, 

[iii]  Oc  7^  0  and  <  oo. 

Then,  the  system  (2.8)  is  completely  observable  if  and  only  if  the  foUowiny  in¬ 
equalities  hold 

a'  7^  a-’  for  i  7=  j.  i.j  =  1,2 . n  —  1.  (2.IC) 

Proof:  Necessity  and  sufficicuicy  r.ui  be  j)roved  again  by  using  the  Popov- 

Belevitch-IIautus  rank  test,  namely,  by  checking  the  rank  of  the  matrix  [C^  (  s/  — 

.4)^]^.  Similar  arguments  lead  to  the  conclusion  of  this  theorem,  with  the  only 
difference  being  that  the  column  transformations  are  changed  to  corresponding 
row  transformations. 


Q.E.D. 


REMARK: 


The  result  on  oliservability  of  the  systems  holds  only  under  the  assumption  that 
the  transfer  functions  of  the  node  system  are  all  linear. 

2.5.  Separation  of  Learning  and  Control 


In  this  Section,  we  shall  discuss  the  effects  of  the  weights  on  the  overall  per¬ 
formance  of  the  system.  In  Section  2.3,  we  showed  that  some  of  the  entries  of 
matrices  A  and  B  in  (2.8)  contain  the  weights  of  the  network.  This  seems  to  imply 
that  the  weights  could  affect  the  dynamics  of  the  system.  However,  this  turns  out 
not  to  be  the  case.  In  fact,  the  transfer  function  (2.10)  tells  us  a  very  important 
fact  that  the  weights  of  the  network  will  only  affect  the  numerator  of  the  system 
and  do  not  affect  the  eigenvalues  of  the  system  will  not  be  affected.  In  general, 
we  have  the  following 

Transfer  Function  =  ’  (2.1’) 

ni=o  (■*  ~  ®i) 


where  w  =  (u;,;)„x„,  a  =  (a°, ..., «"“'),  c  =  (c”,c‘,  ...,c”“')  and  d(s;  w,  a,  c)  is 
a  polynomial  of  order  n-1  whose  coefficients  are  the  linear  combination  of  entries 
of  matrices  w.  a  and  c.  This  property  will  be  formally  staied  as  follows: 


Property  2.1  The  dynomicfi  of  the  system  will  not  hr  njji^rlul  by  rhnnyiny  the 
weights  of  the  network. 


Baseu  on  this  property  and  the  fact  that  the  NLMs  are  extensions  of  the 
McCulloch-Pitts  neurons,  we  obtain  the  Separation  Principlle  of  Learning  and 
Control,  stated  below.  The  importance  of  this  principle  lies  in  the  fact  that  be¬ 
fore  we  actually  use  the  system,  we  can  set  all  a'’s  to  be  zero.  We  then  train  the 


network  using  the  backpropagation  algorithm  with  a  prespecified  training  set  so 
that  the  network  has  the  desired  stationary  property.  After  training  is  done,  the 
parameters  o'  can  be  resumed  and  thus  the  network  will  function  as  a  normal 
system. 

SEPARATION  PRINCIPLE  OF  LEARNING  AND  CONTROL 
The  training  process  of  an  NNLM  and  the  control  process  after  train¬ 
ing  can  be  separated. 


2.6.  Linearization  via  Transformation  of  Coordinates  and 
Nonlinear  Feedback 


In  Section  2.3,  we  saw  that  our  representation  resulted  in  a  nonlinear  discrete¬ 
time  system.  There  are  many  reasons  for  linearizing  a  nonlinear  system,  and 
many  publications  in  the  literature  [29.  53,  54,  72]  discuss  this  problem.  Before 
we  proceed,  let  us  look  at  our  discrete-time  system  whose  nonlinearity  arises  from 
the  nonlinear  transfer  functions.  In  general,  the  transfer  functions  in  input  nodes 
of  a  neural  network  are  linear.  Thus,  the  state-space  description  of  our  system 
has  the  form 


4 

= 

0  0  , 
a  -1- 

= 

= 

^<^l(n-2)C° 

= 

Vk 

= 

53(c"-‘x^ 

.1  ..1 


,n-2_n-2 


n-2 

E’ 

i=l 


(2.1S) 


\ 
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The  above  equations  can  be  written  in  the  following  form: 

Xk  =  f(Xk_i,Uk),  (2.19) 

where  Xk  =  (x^, •  •  •  ,a:r‘)  and  f(Xk_i,Uk)  =  (/o(xk-i.Uk), •  •  • , /„-i(xk_j , Uk)) 
is  a  vector  of  the  equations  which  are  defined  above. 

From  the  above,  we  know  that  the  overall  system  consists  of  a  linear  sub-system 
cascaded  by  a  nonlinear  subsystem  together  with  a  nonlinear  output  equation  (see 
Figure  2.4  ).  This  in  turn  implies  that  the  overall  system  is  a  nonlinear  one. 

It  is  natural  to  consider  the  problem  of  locally  linearizing  the  above  system  via 
coordinate  transformations  and  nonlinear  feedback.  In  general,  not  all  nonlinear 
systems  can  be  so  linearized.  necessary  and  sufficient  condition  will  be  given  in 
Section  2.6.2.  Once  a  linearized  system  is  obtained,  it  is  very  easy  to  implement 
a  nonlinear  control  law  to  have  the  system  track  some  desired  signal. 


Figure  2.4:  Nonlinear  System  Representation  of  Order  n 


2.6.1.  Preliminary 

We  consider  a  smooth  discrete-time  nonlinear  dynamic  system 


Xfc+i  =  f{Xk,Uk+i), 


(2.20) 


10 


where  Xk  =  (a-°,  xi, x]l~’ )  and  U;.  =  (i/°.  u[, u™“')  are  smooth  local  coor¬ 
dinates  for  the  s*^ite-space  M  and  input  space  U  respectively.  Before  discussing 
feedback  linearizability  for  (2.20),  we  introduce  the  notion  of  a  regular  static  state 
feedback.  We  call  a  relation 


Ufc+i  =  Q(Xfc,V;.+i),  (2.21) 

a  regular  static  state  feedback  whenever  '^{Xk,'Vk+i)  is  nonsingular  at  every  point 
{Xk,Vk+i)-  Notice  that  this  implies  locally  a  one-to-one  relation  between  the  old 
inputs  Uj;+i  and  the  new  controls  v^+i.  We  can  now  formulate  the  notion  of  feed¬ 
back  linearizability  for  (2.20): 

Definition  1: 

Let  (xo,uo)  be  an  equilibrium  point  for  (2.20),  i.e.  xo  =  /(xo,uo).  The  system 
(2.20)  is  feedback  linearizable  around  (xo,Uo)  if  there  exists 

(a)  A  coordinate  transformation  5  :  V'  €  i?"  — +  5’(V’)  C  defined  on  a  neigh¬ 

borhood  V  of  Xo  with  5(xo)  =  0; 

(b)  A  regular  feedback  u  =  q(x.v)  satisfying  q(xo,0)  =  uo  and  defined  on  a 

neighborhood  V  x  O  of  (xo-O)  with  ^(x.  v)  nonsingular  on  1'  x  O. 

such  that  in  the  new  coordinates  z  =  S{x)  the  closed  loop  dynamics  are  linear 

z(A:  +  l)  =  Az{k)  +  Dv(k),  (2.22) 

for  some  matrices  A  and  B. 

At  this  point,  let  us  look  at  the  equilibrium  points  of  our  nonlinear  system.  For 
the  system  (2.20)  it  is  not  hard  to  show  that  the  x*,  u*  satisfying  /(x*,u")  =  x* 


have  the  form 


o> 

X  = 


l-a°’ 

x'*  =  - — ^ — r[u’iiC°a°x°'  +  ieiic°it*], 

1  —  a* 


~  1  z~^  n-1  lYl  W2iS2(c‘a‘x'’  +  +  c‘ )] .  (2.23) 

^  ^  i=l 

Therefore,  are  all  linear  functions  of  tt‘  but  x*""^**  is  not. 


n-2 


2.6.2.  Necessary  and  Sufficient  Conditions  for  Local  Linearization  via 
Transformation  of  Coordinates  and  Nonlinear  Feedback 


In  this  section,  we  are  going  to  use  Grizzle’s  necessary  and  sufficient  conditions 
[72]  to  prove  that  our  nonlinear  system  is  locally  linearizable  to  a  controllable 
linear  system.  Before  we  formally  give  the  result  in  the  next  section,  let  us  look 
at  a  sequence  of  distributions  given  by  Grizzle  in  [72].  This  sequence  will  be 
instrumental  in  the  solution  of  the  feedback  linearization  problem  for  (2.20). 

Let  TT  :  A/  xCf  — ♦  A/  be  the  canonical  projection  and  A’ the  distribution  defined 
by 


A’  =  kcr/.. 


(2.21) 


where  A/  C  A".  U  C  A"*  and  /.  is  the  dual  vector  space  homomoi'ldiism  from 
TM  X  TU  to  TM. 
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Algorithm  2.1 

•Assume  /.  has  full  rank  around  (xq.uo). 

Step  0:  Define  the  distribution  Do  in  a  neighborhood  of  (xo,uo)  in  M  x  U  by 

Do  =  x:*(0),  (2.25) 

Step  i+1:  Suppose  that  around  (xo-Uo)  D,  +  l\  is  an  involutive  constant  di¬ 
mensional  distribution  on  T{M  x  U).  Then  define  in  a  neighborhood  of 
(xo,uo) 

A+i  =  ^:7.(A),  (2.26) 

and  stop  if  D,  +  I\  is  not  involutive  or  constant  dimensional. 

The  effectiveness  of  the  above  algorithm  rests  upon  the  following  observation. 

Lemma  2.1  Let  (jq,  ^o)  be  an  equilibrium  point  of  (2.20),  and  assume  that  /.  has 
full  rank  around  (xo,no)-  L^t  D  be  an  involutive  constant  dimensional  disi rihution 
on  M  X  U  such  that  D  +  A"  is  also  involutive  and  constant  dimensional.  Then 
there  exists  a  neighborhood  0  of{xo,  Uq)  stich  that  f.{D\o)  is  an  involutive  constant 
dimensional  distribution  around  Xo. 

Based  on  the  above  algorithm  and  lemma.  CIrizzle  [72]  slates  necessary  aiul  sulfi- 
cicnt  conditions  for  locally  linearizing  a  nonlinear  system  to  a  controllalde  one. 

Theorem  2.4  (Grizzle)  Consider  the  discrete-time  nonlinear  system  (2.20), 
about  the  equilibrium  point  (xo,Uo).  The  system  (2.20)  is  linearizable  around 
(xo,  uo)  to  a  controllable  linear  system  if  and  only  if  Algorithm  2.1  applied  to  the 
system  (2.20)  gives  distributions  Do,...,Dn  such  that  dim(Dn)  =  n  +  m. 


The  proof  of  the  above  lemma  and  tlieorem  can  be  found  in  [72].  In  the  ne.xt 
section,  we  are  going  to  show  that  our  nonlinear  system  satisfies  the  conditions  of 
the  above  theorem  and  thus  the  system  is  locally  linearzable. 

2.6.3.  Main  Result 

Now.  let  us  consider  our  nonlinear  system  (2.20)  in  which  f(x,  u)  has  the  form 

a°x°  +  u 

wiic^a^x^  +  +  u.'nc°u 

:  ,  (2.27) 

ini(„_2)C°a°x°  + 

it’2,S2(c'a'x'  +  c' u,'2,c°«°x°  +  c'wnc^u) 

where  x  =  (x°, x*,  •  •  •  ,x"“')  and  u  is  a  scalar.  Before  we  present  the  main  theo¬ 
rem,  we  shall  stat  and  prove  some  lemmas,  which  will  be  used  later. 

Lemma  2.2  Consider  the  nonlinear  system  (2.20)  and  the  nonlinear  function 
f(x,u)  of  (2.27).  // o'  ^  0  for  0  <  t  <  n  —  1,  then  /.  has  full  rank  around  the 
equilibrium  point  {x' ,u'). 

Proof:  By  noting  that  f  :  M  x  U  —y  M  is  given  by  (2.27)  .  we  can  evaluate 

/.  :  TM  X  TV  —*  TM  by  considering  the  natural  basis  (^7 . )  hi  T.\I  aiul 

^  in  TU.  where  M  €  V  €  R,  and  T.M  and  TU  are  the  tangent  spaces  for  .\1 
and  U  respectively.  Let  be  the  basis  in  the  image  of  /..  Then, 


/(x,u) = 


11 


where  ,4  =  (a.j),  given  by  <i,j  =  i,  j  =  0,  1,  n-1,  and  Qnr-  =  k=0.  1 . 

n-1. 

Thus, 


■  a" 

WiiC°a° 

•  •  •  »’l(n-2)C°«° 

T.?=l  U)2iS2(.)c‘WuC°a°  ' 

0 

a^ 

0 

W2is'fi.)c^a^ 

0 

0 

0 

0 

0 

a"-' 

1 

It’ll  c° 

Zr='»-2.4(-)C<ei,c“  . 

Since  a'  0  for  0  <  <  <  n  —  I,  rank(A)  =  n  and  thus  /.  has  full  rank  around 


Q.E.D. 

Lemma  2.3  Let  the  conditions  in  Lemma  2.2  be  satisfied.  Let  D  be  a  subspnce 
in  TM  X  TU .  Then 


if  dim(D)  <  n, 
if  dim(D)  =  n+1. 


Proof:  Case  |i)  diin(D)  <n: 

I 

Suppose  that  dim(D)  =  p  <  n  and  let  . ] be  the  Ijasis  in  D.  1  lien. 


without  loss  of  generality,  we  have 


(2.2S) 


1') 


where  P  £  and  rank(P)  =  p.  Then. 


/  -4 

{  ^1  N 

{  IIT  \ 
112 

=  p/. 

a 

1  aH  ) 

=  PA 

1  ; 

=  p 

i 

1  H;  y 

To  prove  tli.it  f.{D)  =  span(U'i . ll’p).  it  sufTices  to  show  that  rank(P)=p.  In¬ 

deed.  the  fact  that  rank(P)=p  <  n  and  rank(A)  =  n  implies  that  rank( /-’)=]). 
Thus,  dim(/.(D))  =  dim  (D). 

Case(ii)  dim(D)  =  n+1: 

Then  (2.28)  still  holds,  but  in  this  case.  P  £  and  rank(P)  =  n  +  l.  'I'he 

fact  that  rank(P)  follows  from  Sylvester’s  inequality  on  the  rank  of  the  product  of 
two  matrices  and  the  rank  inequality  for  matrices  A  £  D  £  <  n) 

rank{A)  +  rank{B)  —  n  <rank{AB).  (2.29) 

Therefore,  dim(/.(D))=n. 

Q.E.D. 

Lemma  2.4  I.ft  ;r  (hnotf  fin  ninonirnl  proji  rtion  from  M  .<  ('  onto  M  (jinn 
liij  z(j\u)  =  .r,  and  let  Q  bi  a  snb.'^it  of  I'M  with  ilim(Qi-p  <  n.  linn 
(lim(z-'  (Q))=p  +  1. 

Proof:  Let  Yi,...,Yp  be  a  basis  in  Q.  Then  any  vector  field  in  Q  can  be  repre¬ 

sented  by  n.Vi,  that  is, 


■17 


Let  A'  =  ker  /..  Note  that  /.  :  TM  x  TU  TM,  and  TM  x  TU  C  Pi'  x  R 
and  TM  C  A".  Therefore  /.(a',  •  •  •  ,a'*,a)  =  (a*.  •  •  • , rt".  fl),d.  The  equality 
/.(a\  a^,  •  ■  • ,  a'',  a)  =  0  implies  that 

=  0.  (2.31) 


■  1 

-ivuc°  ••• 

1 
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»’i,.S)(.)r‘!ri,c° 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

■  0° 

0 

...  0 

0 

0 

o' 

...  0 

0 

0 

...  0 

o'*-' 

1 

0 

...  0 

0 

Now  right-multiplying  (2.31)  by  T  yields: 

(  o'  •  •  •  «'*  d  )  =  0, 

or 

(d-fo°r/'  ,;'d-  o"--.-/”-'  =0. 

from  \vhi<  h  we  conclude  that  d*  =  •  •  •  =  =  d"  =  0.  But  ii  =  -u'k/'. 

So. 

I\  =  span(d'— ^  —  o^d'-^). 

ax^  Oil 

=  span(l^) 

where  i’’  =  ^  ~  and  dim(A’)=l. 
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Step  2:  Let  Dq  =  7r7^(0).  Then  we  have  Dq  =  span(^)  and  dim(Z)o)=l  from 
Lemma  2.4.  Let  =  ?rr^/.(A)  for  0  <  i  <  n. 

Suppose  now  dim(19,)=P  and  Xi,  -  ••  ,Xp  are  a  basis  for  D,-.  Then  we  have 


r  d  -1 
dx^ 

X2 

• 

=  p 

a 

.Xp. 

T 

■  du  • 

where  P  €  Thus,  [^i,  •  •  • ,  Xp,  Y\  is  a  basis  for  A  +  /C,  and 


Xi  ■ 

■  p  ■ 

r  JL  1 

1"  . . 

.  p  . 

a 

T 

•  du  • 

where  p=[l  0  0  ...  0  — a®]-  Obviously,  [Xi,Xj]  =  0  for  i  ^  j  and 

[X,-,  V']=0  for  all  i.  Therefore,  A  +  K  is  involutive  and  has  constant  dimension. 
Repeatedly  applying  Lemma  2.2  and  Lemma  2.3  on  A)  and  using  induction  on 
i,  we  obtain  a  whose  dimension  is  n+1.  It  follows  from  Grizzle’s  necessary 
and  sufficient  condition  that  the  nonlinear  system  is  linearizable  to  a  controllable 
linear  system. 

Q.E.D. 


2.7.  Discussion 

The  lack  of  rigorous  mathematical  representation  of  control  systems  in  current 
paradigms  of  feedforward  and  recurrent  neural  networks  is  a  drawback  to  the  de¬ 
velopment  of  research  on  neural  networks  for  control.  The  feedforward  networks 
are  known  to  work  as  a  mapping  between  two  information  domains.  Most  of  the 
current  research  in  neural  networks  for  control  and  related  publications  discuss 
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using  this  type  of  neural  network  to  “learn”  a  model  or  a  controller,  which  is 
usually  either  highly  nonlinear  or  hard  to  implement.  The  results  published  show 
that  these  approaches  are  satisfactory  in  some  cases.  However,  there  is  little  de¬ 
velopment  to  attempt  to  relate  the  theory  of  classical  and  modern  control  systems 
to  this  type  of  neural  network.  Neural  networks  of  this  type  are  always  treated  as 
a  “Black  Box”  and  thus  there  is  no  direct  contact  with  the  “internal”  information 
of  the  box.  A  clcissical  linear  control  system,  which  may  also  be  called  a  “Black 
Box”,  can  be  represented  by  a  transfer  function  in  the  linear  case,  and  thus  the 
input-output  performance  can  be  studied  thoroughly.  This  work  exploits  the  “in¬ 
ternal  information”  of  the  network  and  attempts  to  represent  the  control  systems 
in  terms  of  this  information.  Therefore,  the  network  itself  is  not  only  a  control 
system,  but  it  is  also  capable  of  learning.  In  this  case,  the  paradigm  presented 
here  may  be  viewed  as  an  extension  of  current  recurrent  networks. 

As  quoted  in  [93]  by  Williams:  “While  much  of  the  recent  emphasis  in  the 
field  has  been  a  multilayer  network  having  no  feedback  connections,  it  is  likely 
that  the  use  of  recurrently  connected  networks  will  be  of  particular  importance 
for  applications  to  the  control  of  dynamical  systems”.  Indeed,  because  of  the 
incorporation  of  feedback  or  dynamics  inside  the  networks,  the  recurrent  networks 
show  great  promise  for  the  future  of  research  on  neural  networks  for  the  purpose  of 
control.  The  property  that  the  Hopfield  net  has  a  Constant  .Addressable  Memory 
provides  a  way  for  implementing  many  practical  problems;  e.g.,  traveling  salesman 
problems.  Another  particular  type  of  recurrent  network,  a  settling  network,  has 
also  been  widely  recognized  as  important  in  connectionist  circles.  Such  a  network 
converges  to  a  stable  state  from  any  starting  state.  The  final  state  of  such  a 
network  can  be  viewed  as  the  solution  to  a  certain  constraint-satisfaction-type 
of  search,  cis  in  relaxation  labeling,  or  it  might  be  viewed  as  a  retrieved  item 
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from  a  content-addressable-associative  memory.  Despite  this,  the  ambiguity  of 
information  stored  in  networks  hinders  the  networks  direct  use  of  the  information, 
and  thus  there  is  very  limited  use  for  this  type  of  network  for  control  purposes. 

Our  attempt  is  to  mathematically  formulate  the  control  systems  inside  the 
neural  networks.  We  can  easily  represent  each  linear  SISO  system  in  the  neural 
network,  by  introducing  a  small  feedback  loop  INSIDE  each  neuron,  rather  than 
a  feedback  connection.  For  this  paradigm  of  neural  networks,  we  can  directly  use 
the  internal  states  to  construct  a  feedback  control  law.  What  is  more  important 
is  that  a  network  of  this  type  is  itself  a  system,  but  not  an  unknown  “Black 
Box”.  Thus  its  input-output  performance  can  be  studied  just  as  in  the  case  of  the 
classical  control  system  .  Based  on  this  observation,  many  conventional  synthesis 
methods  can  be  directly  borrowed  to  design  the  system.  The  stationary  property 
of  the  system  can  be  preassigned  by  means  of  learning,  a  unique  feature  that  the 
classical  control  system  does  not  have. 

Of  course,  this  is  only  a  first  step  in  this  direction  of  research.  There  are  still 
many  interesting  open  problems,  such  as; 

1.  Designing  a  controller  which  is  also  a  neural  network  of  the  same  structure, 
and  then  applying  the  controller  in  the  system  modelled  by  the  neural  net¬ 
works  discussed  in  this  ciiapter.  It  is  interesting  to  study  this  type  of  mixed 
network  and  to  explore  its  properties. 

2.  Carrying  out  research  in  the  case  of  multi-variable  system.  It  is  straightfor¬ 
ward  to  extend  current  results  to  a  multi-input  and  multi-output  system. 
However,  extension  of  the  results  of  linearizability  is  not  trivial  and  requires 
further  study. 


3.  Considering  how  to  construct  the  training  set.  By  the  Separation  Principle 
of  Learning  and  Control,  the  systems  of  this  chapter  can  be  regarded  as 
networks  when  the  dynamic  parameters  are  set  to  zero.  Thus  they  have 
the  capacity  of  learning.  The  construction  of  a  training  set  is  an  interesting 
problem.  Also  by  this  same  Principle,  it  is  not  hard  to  show  that  it  is  possible 
to  construct  a  training  set  such  that  after  the  network  has  learned,  it  has 
the  desired  stationary  properties.  Thus,  the  problem  of  how  to  construct  a 
training  set  so  that  che  trained  system  has  the  desired  stationary  property 
needs  to  be  investigated. 

4.  Considering  the  applications  of  results  in  this  chapter  to  the  differential 
game  problems.  Differential  game  problems  can  be  modeled  by  an  NNLM, 
and  control  strategy  for  each  player  can  be  obtained  using  various  controller 
design  techniques.  In  this  case,  the  NNLM  used  for  modeling  the  differential 
game  has  at  least  two  inputs,  since  most  differential  games  have  at  least  two 
players.  This  application  is  very  interesting  and  is  worthy  of  further  study. 
Although  results  in  this  chapter  will  not  be  directly  used  in  the  next  chapter, 
they  do  have  potential  applications  for  such  kinds  of  problems.  For  example, 
by  modeling  a  differential  game  problem  using  an  NNLM  and  by  designing 
controllers  using  additional  neural  networks  without  local  dynamics,  we  ob¬ 
tain  a  network  consisting  of  several  small  neural  networks.  St  udying  such  a 
network  which  consists  of  several  small  neural  networks  is  also  interesting. 
VVe  hope  that  results  of  this  kind  will  appear  in  the  near  future. 
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3.  Differential  Games  with  Neural  Networks 

Rufus  Isaacs  [40]  first  looked  into  the  theory  of  modeling  tactical  encounters  in 
what  he  termed  “Differential  Games”  in  his  seminal  Rand  report  [30].  Isaacs 
assumed  a  differential  model  for  aircraft  dynamics.  He  also  assumed  that  the  roles 
of  pursuer  and  evader  are  fixed  for  the  duration  of  an  encounter.  Only  the  pursuer 
was  assumed  to  have  weapon  capabilities  that  could  be  modeled  by  a  hyper-surface 
in  the  state  space.  For  more  than  two  decades  since  Isaacs’  pioneering  work,  there 
have  been  many  publications  in  the  literature  about  this  subject.  However,  the 
problems  studied  in  the  literature  were  different  from  that  studied  by  Isaac  in 
his  original  report.  There  are  many  practical  systems  which  can  be  modeled  by 
differential  game  problems.  For  example,  cooperative  and  non-cooperative  games 
are  typical  differential  games  in  the  area  of  economic  systems.  Up  to  now.  it  is 
commonly  agreed  upon  that  there  has  been  a  strong  theoretical  foundation  in  the 
field  of  differential  games. 

However,  there  are  some  reasons  why  differential  games  have  not  had  widosprcafl 
use  in  air  combat  whose  arena  is  one  of  the  most  complex  dynamic  systems.  In 
fact,  the  applications  of  conventional  differential  game  theory  in  many  pract ical 
problems  are  limited  because  of  the  following  rea.sons.  First,  the  classical  solution 
of  a  differential  game  is  based  on  simultaneous  backwards  integration  of  the  state 
and  adjoint  equations,  starting  at  the  target  set  (terminal  manifold)  of  the  game, 
in  order  to  fill  the  entire  game  space  by  the  ensemble  of  optimal  trajectories.  The 
backwards  integration  is  a  rather  direct  operation  as  long  as  no  singular  surface 
of  the  game  exists.  In  a  simple  game,  with  no  more  than  two  state  variables,  the 
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existence  of  the  singular  surface  can  be  easily  visualized.  For  a  differential  game 
of  three  independent  state  variables  the  same  process  becomes  very  cumbersome, 
and  for  dynamic  models  of  higher  dimension,  it  is  virtually  impossible.  Second, 
at  the  other  end.  the  numerical  solution  of  a  two-point  boundary  value  problem 
associated  with  the  differential  game  satisfies  only  the  necessary  conditions  of 
optimality,  and  it  cannot  identify  the  singular  surfaces  of  the  game  and  has  no 
tool  to  verify  the  sufficiency  conditions  of  the  game  solution.  Third,  as  illustrated 
by  an  example  in  [SI],  Shinar  indicates  that  thejframe  of  classical  differential 
game  theory  is  rather  limited  to  accommodate  even  relatively  simple  models  of  a 
“real-world"  dynamic  conflict.  To  be  more  exact,  the  assumption  that  the  roles 
of  pursuer  and  evader  are  fixed  during  an  encounter  is  not  reasonable  since  all 
participants  in  an  encounter  have  weapon  systems  that  can  be  modeled  by  a  con¬ 
tinuous  probabilistic  function.  Fourth,  in  future  ^ir  combat  most  engagements 

I 

will  start  at  rather  long  (beyond  visual)  ranges.  Thus,  the  initial  conditions  of 

! 

the  above  described  two-target  game  are  generally  in  the  “draw”  zone.  Therefore, 
the  only  guaranteed  outcome  of  such  a  non-cooperative  game  is  a  “draw”.  This 
result,  which  denieis  the  very  essence  of  an  air-to-air  combat  and  consequently 
the  justification  of  the  high  cost  of  advanced  aircraft  and  missile  development,  is 
clearly  unacceptable  from  an  operational  point  of  view. 

.■\s  a  consequence  of  the  difficulties  outlined  in  the  previous  paragraph,  one  is 
strongly  tempted  to  search  for  an  alternative  approach  to  the  analysis  of  dynamic 
conflicts.  A  priori,  Artificial  Intelligence  seems  to  represent  such  an  alternative. 
.An  interesting  concept  introduced  in  [77]  is  the  OODA  loop  -  Observe,  Orient, 
Decide  and  Act  -  in  a  cyclical  maneuver.  The  OODA  concept  is  based  on  the 
fact  that  in  current  classical  air  combat  at  short  range,  pilots  have  been  using 
their  eyes  eis  sensors  and  their  brains  to  integrate  the  visual  and  sensory-supplied 
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information  necessary  to  play  the  game.  Although  the  OOD.A  loop  undoubtedly 
plays  an  Important  role  in  air  combat  pilotage,  the  limitations  are  obviously  those 
of  human  ability  in  the  supersonic  combat  environments.  Humans  are  character¬ 
ized  by  a  limited  processing  rate  -  two  events  taking  place  in  less  than  about  one 
tenth  of  a  second  will  generally  be  perceived  as  a  single  event.  Another  limita¬ 
tion  concerning  the  processing  rate  involves  the  fact  that  an  activity  of  integrated 
percepts,  decision  and  motor  action  is  performed.  To  overcome  these  difficul¬ 
ties,  Rodin  et  al  proposed  in  [77]  an  Artificial  Intelligence  approach  to  designing 
an  operational  on-board  system,  called  “Tactical  Decision  Aid  Expert  Systems 
(TDAES)”,  to  support  pilots  in  tactical  decision  making  processes.  The  E.xpert 
System  generated  an  initial  flight  and  action  plan  (initial  mission).  The  optimal 
plan  gets  reevaluated  and  possibly  changed  every  time  when  an  unforeseen  event 
takes  place.  The  system  employs  a  basic  set  of  pursuit-evaision  algorithms  for 
suboptimal  mission  generation. 

In  their  pioneering  work,  Rodin  et  al  proposed  in  [77]  the  use  of  artificial  intel¬ 
ligence  methodology  in  air  combat  games.  The  work  was  based  on  the  semantic 
control  paradigm  proposed  in  [76]  by  Rodin.  In  this  paradigm,' a  control  problem 
is  broken  into  three  blocks.  Their  functions,  when  applied  to  a-situation  governed  ■ 
by  differential  games  (for  Instance),  are  as  follows: 

(i)  Identifier:  Identifier  block  identifies  the  differential  game,  parameters  and 

role  that  the  aircraft  should  assume  in  order  to  destroy  an  assigned  target. 

(ii)  Goal  Selector:  The  Goal  Selector  solves  the  differential  game  chosen  by 
the  Identifier  block.  The  results  are  the  optimal  trajectories,  barriers  and 
controls. 
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(iii)  Adaptor:  The  Adaptor  determines  the  controls  that  causes  the  aircraft  to 
“best"’  follow  the  optimal  trajectory  determined  by  the  Goal  Selector. 

In  [95],  Weil  used  Artificial  Intelligence  methodologies  to  splice  the  solutions  in 
low  order  one  target  models  together  in  a  sub-optimal  fashion  that  will  be  useful 
in  air  combat.  He  explored  the  characteristics  of  a  self  modifying  system.  .Among 
the  methodologies  he  used,  the  artificial  neural  net  approach  is  impressive.  Several 
neural  nets  are  in  his  approach.  One  net  is  the  classification  net  that  determines 
which  differential  game  from  the  knowledge  base  will  be  chosen  along  with  some 
of  the  criteria  that  might  be  used  to  make  the  decision.  Once  a  differential  game 
is  chosen,  another  neural  net  is  assigned  to  the  game  so  that  it  can  determine 
the  parametrizations  of  the  game.  The  method  of  generating  the  training  set  for 
each  net  is  uriique  in  the  sense  that,  instead  of  generating  the  training  set  off-line 
by  using  traditional  methods,  the  training  process  is  done  by  closing  the  loop 
simulation  forward  and  adjusting  the  weights  by  propagating  errors  backward  in 
time.  Training  each  individual  net  is  relatively  independent. 

Based  on  the  above  semantic  control  paradigm,  we  present  here  a  new  approach 
to  using  neural  networks  in  differential  game  problems.  The  approach  reflects  the 
fundamental  phases  in  real-life  conflicts.  Instead  of  building  up  a  knowledge  base, 
we  implement  the  neural  nets  in  the  Identifier  and  .Adaptor  blocks  of  the  semantic 
control  paradigm  so  that  the  approach  is  more  general  and  is  capable  of  working 
on-line  during  an  entire  encounter.  Generating  training  sets  for  these  two  neural 
nets  is  different  from  other  approaches  [49,  95]  in  the  sense  that  the  nets  in  our 
approach  do  not  learn  the  optimal  trajectories  generated  by  optimal  controls  but 
learn  the  mapping  between  two  information  regions. 
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This  chapter  first  discusses  the  fundamental  phases  in  real-life  conflicts  and 
gives  a  general  configuration  summarizing  these  phases.  The  configuration  pro¬ 
vides  a  basis  for  further  analysis  and  the  paradigm  for  differential  games  with 
neural  networks.  In  Section  3.2.  two  paradigms  of  differential  games  with  neu¬ 
ral  networks  are  given  with  emphasis  on  one  player  and  two  players  respectively. 
These  paradigms  represent  semantic  control.  One  contribution  of  this  approach 
is  the  Identifier,  which  takes  the  environmental  information  and  outputs  an  esti¬ 
mate  of  the  opponent’s  strategy.  In  Section  3.3,  the  INTERCHANGEABILITY 
conditions  will  be  discussed.  Interchangeability  is  a  basic  assumption  throughout 
this  study.  In  Section  3.4,  several  algorithms  with  the  same  basic  frame  will  be 
given.  A  detailed  discussion  of  each  stage  and  of  the  implementation  of  the  algo¬ 
rithms  are  also  given.  In  Section  3.5.  a  detailed  study  of  a  pursuit-evasion  game 
problem  will  be  given.  The  simulation  results  are  quite  satisfactory.  Discussion 
of  the  simulation  results  with  the  algorithms  of  different  paradigms,  direction  of 
further  research  on  this  topic,  and  the  conclusions  will  be  given  as  well. 

3.1.  Motivation 

In  real-time  life  conflicts,  the  actions  of  two  players  usually  consist  of  three  phases: 
discovering  the  opponent,  finding  wl.at  this  opponent  is  doing,  and  making  a  . 
decision.  Upon  finding  the  opponent,  the  pilot  will  first  identify  his  maneuver, 
and  then  react  according  to  his  opponent's  perceived  action.  Sometimes,  it  is  hard 
to  distinguish  between  phcises  because  some  phaises  may  last  very  short  periods. 
These  three  phases  are  the  same  in  all  air  combat  problems,  though  the  last  two 
stages  may  repeat  for  the  successive  time  intervals,  e.g.  the  pilot  may  change 
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from  e%'asion  to  pursuit  upon  finding  that  the  action  of  his  opponent  changes 
from  attack  to  withdrawal. 

The  above  process  is  similar  to  the  OOD.-’i.  loop  outlined  in  the  last  section, 
and  these  pha.ses  can  be  formally  stated  as  the  following  configuration.  VVe  assume 
that  there  are  two  players  engaged  during  the  entire  encounter  of  the  game.  Each 
player  employs  the  three  phases  in  the  process  of  making  a  control  decision.  We 
only  give  the  phases  for  one  player  because  the  phases  for  the  other  player  are  the 
same.  For  each  player,  each  individual  phase  may  start  at  a  different  instant. 

General  Configuration  of  Phases  in  Real-life  Conflicts: 

Phase  1:  Discovery  of  the  opponent 

Phase  2:  Identification  of  the  actions  of  this  opponent 

Phase  3:  Making  decision  for  ne:;t  action  according  to  this  opponent’s 

perceived  strategy 


Phase  1  is  self-explanatory.  Immediately  after  finding  his  opponent,  the  player 
identifies  an  approximation  to  the  strategy  and  maneuver  and  obtains  informa¬ 
tion  about  the  range,  bearing,  heading  and  speed  of  his  opponent.  Gathering  this 
information  is  accomplished  in  phase  2.  In  our  discussion,  examples  of  strate¬ 
gies  may  be  pursuit,  evasion,  or  disengage.  Typical  examples  of  maneuvers  may 
be  found  in  [57.  80].  Techniques  used  for  identifying  the  tactical  air  combat 
maneuvers  are  in  the  forms  of  decision-making,  scheduling,  and  control  systems 
[10,  22.  .37,  64].  Recent  works  in  using  neural  networks  in  diverse  ways  for  the 
classification  of  underwater  sonar  targets  [27]  and  recognition  of  radar  targets  [71] 
show  great  promise  in  the  area  of  maneuver  identification.  The  information  i.bout 
the  range,  bearing,  heading  and  speed  of  his  opponent  is  generally  gathered  by 
means  of  ma.thematical  devices.  Many  times,  this  information  is  crucial  to  the 
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player’s  control  decision  making.  In  phase  3,  the  player  reacts  according  to  his 
opponent's  perceived  action.  His  strategies  might  he  pursuit,  evasion,  or  disen¬ 
gage.  Generally  speaking,  his  strategy  is  some  type  of  generic  mapping  of  his 
opponent’s  strategy,  which  in  some  cases  may  be  a  conlinuous/discrete-tirne  type 
of  mathematical  functions.  In  light  of  the  above,  his  strategy  could  be 


strategy  =  < 


evasion 

pursuit 

observation 


if  his  opponent  is  pursuing: 

if  his  enemy  is  escaping  and  a  chance  exists; 

if  both  sides  choose  to  be  peacefid. 


REMARKS: 

i 


1.  In  this  Configuration,  one  player  should  be  in  one  and  only  one  phase  at  any 
instant  although  sometimes  it  is  hard  to  distinguish  between  phases  because 
of  their  jvery  short  periods.  For  example,  one  player  usually  identifies  what 
strategy  his  opponent  is  taking  almost  at  the  same  time  he  discovers  his 
opponent  and  thus  it  is  usually  hard  to  distinguish  between  phase  1  and 


phase  2 


for  this  player. 


2.  At  any  instant,  two  players  may  be  in  different  phases  because  the  phases  for 
each  individual  player  may  last  for  different  periods.  .Moreover,  the  last  two 
phases  for  each  individual  player  would  alternate  as  time  passes.  However, 
the  mutual  and  continual  observation  of  each  player  by  his  opponent  is  also 
assumed,  as  is  their  mutual  ability  to  identify  the  opponent's  maneuvers. 


3.  Although  the  dependence  of  next  action  on  the  opponent’s  strategy  is  usually 
not  clear  in  general  zero-sum  differential  game  problems,  this  configuration 
is  realistic  for  many  applications  of  differential  games  and  other  real-life 
conflicts.  In  fact,  in  the  general  zero-sum  differential  game  problems,  each 
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player  in  such  games  is  supposed  to  play  his  own  optimal  strategy  regardless 
of  what  his  opponent  does.  Failing  to  do  so  on  one  side  may  do  the  other 
side  a  favor  by  increasing  or  decreasing  'he  relevant  cost  function.  However, 
not  only  are  there  some  categories  of  differential  game  problems  in  which 
the  optimal  strategy  of  one  side  depends  on  the  strategy  of  the  other  side, 
but  also  real  life  seems  to  be  like  that.  .\  pilot  will  rarely  fly  an  absolute 
optimal  trajectory  and  a  good  pilot  will  overcome  a  poor  one  by  capitalizing 
on  the  latter’s  mistakes.  Thus,  he  will  formulate  the  plan  of  his  actions  in 
accordance  to  what  his  opponent  happens  to  be  doing. 

To  justify  the  genera'  configuration  given  previously,  we  shall  give  the  following 
simple  two-player  game  problem  in  which  E  denotes  one  player  (e.g.  an  evader) 
and  P  the  other  (e.g.  a  pursuer).  In  this  e.xample,  we  shall  omit  the  pha.ses  1  and 
2  in  the  general  configuration,  since  they  are  conceptually  simple  in  this  case,  and 
emphasize  the  ph 

Example: 

Consider  a  simple  game  problem;  in  a  2-dimensional  plane,  p  is  the  upper  half¬ 
plane  and  }?  is  fho  x-a.\is.  The  vecto-gram  for  E  is  shown  in  Figure  .'M;  it  has  a 
downward  unit  vertical  component  and  a  horizontal  headline  of  half-length  u(x.y), 
a  positive  and  smooth  function.  That  of  P  is  circular  of  radius  w(x,  y),  again  a 
smooth,  positive  function  with  always  w  <  u  and,  for  some  constant  c,  u;  <  c  <  1. 
The  total  velocity  for  x  is  to  be  the  vector  sum  of  a  choice  from  each  vectogram. 
Analytically  all  this  means  that  the  kinematic  equations  are 


X 


u(x,j/)0  +  iL'{x,y)sin<^, 


(3.1) 
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i)  =  —  1  +  w[x^y)cosOs  ~1  <  t’  <  1.  (3.2) 

The  payoff  will  be  terminal  with  //  =  x  on  3?  (where  y  =0).  Thus  E  will  strive 
to  have  X  reach  -R  at  a  point  as  far  right  as  possible,  and  P  similarly  struggles  for 
the  left. 

Always  E  will  play  his  rightmost  vector  (0  =  1  in  the  KE).  At  Figure  3.2  let 
AM  be  this  vector.  The  line  XB  is  tangent  from  A'  to  a  circle  of  radius  w  (w 
is  reckoned  at  V)  and  center  A.  Then  XB  is  a  properly  oriented  semipermeable 
direction.  If  a  family  of  curves  is  drawn  (an  ordinary  differential  equation  solved) 
having  these  directions  as  that  of  their  tangents  at  each  point,  these  curves  will 
be  semipermea  If  each  is  labeled  with  the  value  of  H  at  its  meeting  with  3?.  the 
labelings  will  constitute  V’(x). 


(a)  (b) 

Figure  3.1:  Vectograms  of  Players  E  and  P 


(3.3) 

(3.4) 


A  paradigm  for  differential  games  with  neural  networks  is  shown  in  Figure  3.3.  It 
is  called  a  one-player  paradigm  of  differential  games  with  neural  networks  because 
only  one  side  of  the  game  is  considered.  Figure  3.3  clearly  .'epresents  the  general 
configuration  discussed  in  the  last  section  although  the  first  phase  is  omitted.  The 
paradigm  consists  of  two  stages.  Stage  I  represents  one  player’s  process  of  making 
a  control  decision,  which  imitates  phase  2  and  phase  5  in  the  general  configuration. 
In  this  stage,  the  neural  identifier  takes  the  environmental  infc:mation  and  gives 
the  estimate  of  the  opponent’s  control  strategy.  The  input  information  to  the 
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Neural  Identifier  is  relatively  independent  of  the  system  state  variables  because 
the  network  takes  only  the  environmental  information  which  usually  comes  from 
the  sensor,  e.g.,  from  readings  of  an  odometer.  To  represent  phase  3  of  the  general 
configuration,  we  use  another  neural  net,  called  Neural  Controller,  for  the  control 
purpose.  Based  on  the  estimate  of  the  opponent’s  control  strategy  and  the  optimal 
criterion  given  by  the  user,  the  neural  controller  gives  the  optimal  control  for  the 
player.  The  information  of  the  state  variables  x  is  needed  for  evaluation  of  the 
optimal  control  because  the  neural  net  controller  is  usually  viewed  as  a  part  of 
the  system  for  stage  I. 


In  stage  II,  because  we  only  consider  the  one-player  paradigm,  we  simply  put  a 
neural  net  controller  in  stage  II.  which  represents  the  process  of  making  a  control 
decision  for  the  other  player.  VVe  assume  no  control  oyer  the  player,  and  he  can 
make  a  control  decision  based  on  his  own  criterion. 


System 


stage  II 


stage 


5 


■NT5 — 
Controller 


NN 
Identifier 


-Goal  selector 


V 


environment 


N 

Com 

N 

roller 

X 

-  Goal  selector 


-Adaptor 


Figure  3.3:  One-player  Paradigm  of  Differential  Games  With  Neural  Networks  (a) 
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The  system  is  described  by  a  set  of  differential  equations 

X  =  /{x,  o,  r),  x(to)  =  xo,  (3.6) 

where  /(x.  <?.  v)  is  an  n-dimensional  vector  of  continuously  differentiable  functions 
and  X  is  an  n-dimensional  state  vector.  The  goal  for  both  players  is  to  choose  o 
and  c’  to  maximize  and  to  minimize  the  cost  function  J(0.  c.xo).  i.e.. 

maxmin  J(d),  t’.xo),  (3.7) 

4>€<t  v€'I' 

where  ^  is  the  feasible  set  for  o  and  'P  is  the  one  for  ip.  The  optimal  value  is 
denoted  by  •^optimal 

4ptimal  =  (3.8) 

Now  let  us  define  by 

=  max  J((?.  ii',xo),  ?/’€'!',  (3.9) 

where  again  $  and  'I'  are  the  ferisible  sets  which  contain  all  possible  values  for  <p, 
ip  respectively.  We  have 

'^optimal  “  •  .  (^-^0) 

Obviously.  is  a  functional  of  v.  Replacing  f  by  v,  the  estimate  of  t'.  yields 

Jr  =  max  J(o,  t\xo),  f  € (3.11) 

Equation  (3.11)  is  important  in  the  sense  that  it  is  the  analytical  representation 
of  stage  I  in  Figure  3.3.  Namely,  equation  (3.11)  is  the  representation  for  the 
player  who  tries  to  usi  4>  to  minimize  J{<p,ip,xo)  based  on  the  estimate  value 
i/’.  Changing  the  cost  function  J(<?,v%xo)  is  equivalent  to  changing  the  goal 


6o 


selector[76]  in  stage  I.  Based  on  the  estimate  of  the  control  strategy  v.  one  player 
makes  a  control  decision  using  the  optimal  criterion  (3.11). 

If  we  exchange  symbols  ip  and  <f>  in  equation  (3.11),  it  is  equivalent  to  exchang¬ 
ing  <p  and  v  in  Figure  3.3  and  is,  in  this  case,  representing  the  player  who  tries 
to  use  ti'  to  minimize  J(o,  t/’.xo)  based  on  the  estimate  value  (t>.  To  represent  this, 
we  define  Jc  and  •/optimal 

=  m\nJ(<p,tp,xo),  (3.12) 

‘4>ptimal  ~  (3.13) 

Figure  3.4  shows  the  paradigm  for  this  case.  One  question  may  be  asked  about 
the  equivalence  of  Figure  3.3  and  Figure  3.4.  That  is  equivalent  to  asking  whether 

'^optimal  ~  '^optimal’  (3-14) 

A  more  detailed  discussion  on  this  topic  will  be  given  in  the  next  section. 

Unlike  the  general  scheme  of  difTerential  games,  the  enemy’s  strategy  need  not 
be  optimal  in  this  paradigm.  Instead  of  assuming  that  the  enemy  plays  optimally, 
we  shall  consider  all  the  possibilities  that  the  opponent  can  take.  This  can  be 
seen  in  equations  (3.9),  (3.10)  and  (3.11).  In  equation  (3.9).  is  a  functional  of 
ip  which  takes  all  possible  values  from  the  feasible  set 

The  above  paradigm  assumes  thac  the  function  /(x,  (>,  0)  is  known  and  it  is 
continuously  differentiable  to  all  its  arguments.  We  do  not  eliminate  the  possibility 
that  is  unknown,  in  which  case  an  additional  identification  process  is 

needed  to  get  the  estimate  of  /(x,  (f>,  i').  For  simplicity,  we  assume  that  the 
function  f{x,<j>,il<)  is  known  throughout  this  work. 


3.3.  Interchangeability 

In  this  section,  the  following  assumption  will  be  justified 

•^optimal  ~  '^optimal'  (3.15) 

In  general,  the  following  statements  are  equivalent: 

(a)  0  and  v  are  e.xchangeable  in  Figure  3.3  and  Figure  3.4. 

(^)  “^optimal  ~  '^optimal’ 

(c)  Interchangeability: 

max  min  7  =  max(min  J)  =  min(max  J), 

«€♦  vC**  v€'V  v€'(' 

subject  to  i  =  f{x,o.v). 

Before  we  discuss  the  problem  in  detail.  let  us  look  at  a  simple  example  [12]. 
Define  L(u,v)  =  —  3uu  +  2u^.  — 1  <  u  <  1.  — I  £  i’  <  1  and  it  is  not  hard  to 
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see  that 

max(mjn  il(u,  y))  =  0,  (3.16) 

and 

mjn(maxiL(u,y))  =  2.  (3.17) 

Therefore, 

max(minZ,(u,  y))  ^  min(max  Z.(u,  y)). 

t/  '  U  tt  «  ' 

From  this,  we  know  that  it  is  not  always  true  that  statement  (c)  holds.  The 
condition  that  statement  (c)  or  (a)  holds  is  called  the  interchangeability  condition. 

In  general,  consider  a  continuous-time  system  whose  equation  is  given  by 

X  =  /l(<)x-t-B‘(0u^t)  +  B*(0u2(t),  <>0,  (3.18) 

/:(u^u2)  =  |x(</)l^^ -b  jr^{lx(t)lQ(,)  +  |u*(t)p-r(0|u*(0ndt,  (3.19) 

where  Qj  >  0,  ^(0  >  0,  <  €  [0,</j,r(<)  >  0,</  is  the  terminal  time,  all  matrices 
and  and  have  piecewise  continuous  entries.  The  initial  state  Xq  is  known 
to  both  players. 

Let  E'  denote  the  policy  space  of  player  i.  Further,  introduce  the  function  J\ 
X  R  by 


J(/i‘,^^)  =  Z(u\u2).  (.j.20) 

=  ^  €  H*.  i  =  1,2,  (3.21) 

where  we  have  suppressed  the  dependence  on  the  initial  state  xq.  The  triplet 
constitutes  the  extensive  form  of  the  zero-sum  dynamic  game,  in 
the  context  of  which  we  can  introduce  the  notion  of  a  saddle-point  equilibrium. 
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Definition  1: 

Define  the  following  quantities; 

J=  min  max 

J_—  max  min  J(u^,ii^), 

where  J  and  J  are  the  upper  value  and  the  lower  value,  respectively.  Generally, 
we  have  the  inequality  7  >  J  in  the  context  of  static  games. 

Definition  2: 

Given  a  zero-sum  dynamic  game  in  extensive  form,  a  pair  of  policies 

X  constitutes  a  saddle  point  solution  if,  for  all  €  H*  x  E*. 

<  r  :=  (3.22) 

The  quantity  J*  above  is  called  the  value  of  the  game,  which  is  defined  even  if  a 
saddle  point  solution  does  not  exist,  as 

j  ■=  min  ma_x  =  7*  =  ma_x  min  =:  7  (3  23) 

Only  when  7  and  J  are  equal,  as  in  eq.  (3.23),  is  the  v'alue  7"  of  the  game  defined. 


In  [9],  Basar  considered  the  system  described  byeq.  (3.18)  and  the  cost  function 
in  eq.  (3.19)  and  gave  the  following  result  on  the  open-loop  saddle  point  solution 
[9]: 

Lemma  3.1  The  quadratic  objective  functional  Z,(u*,u^)  given  by  eq.  (3.19),  and 
under  the  state  equation  (3.18),  is  strictly  concave  in  for  every  open-loop  policy 
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u*  of  Player  ,4.  if,  and  only  if  the  following  Riccati  differential  equation  does  not 
have  a  conjugate  point  on  the  interval  [0,  ^/] 

S  +  A'S  +  SA-{-Q  +  -SB^B''^S  =  0;  Sitj)  =  Qj.  (3.24) 

r 

We  have  the  following  result  on  the  open-loop  saddle-point  solution  [9]: 

Theorem  3.1  For  the  linear-quadratic  differential  game  with  open-loop  informa¬ 
tion  structure,  let  the  condition  of  above  Lemma  be  satisfied,  and  introduce  the 
following  Riccati  differential  equation 

Z  -i-  A'Z  -i-  ZA-i-Q-ZB^B'^Z  -\--ZB^B''^Z  =  0;  Z{tj)  =  Qj.  (.3.25) 

r 

Then, 

(i)  The  Riccati  differential  equation  (3.24)  does  not  have  a  conjugate  point  on 


the  interval  [0,tf]. 

(ii)  The  game  admits  a  unique  saddle-point  solution,  given  by 

u^^  =  p^-if,xo)  =  -BHtyZ{t)xV)  _  (3.26) 

U^*(0  =  Xo)  =  -^i?^(0'^(Ox*(0,  t>0,  (.3.27) 

r(f) 

where  X[ot^)  is  the  corresponding  state  trajectory,  generated  by 

=  (A-{B^B'^  --B^B'^)Zit))x'-,  x-(0)  =  xo.  (3.28) 

r 

(iii)  The  saddle-point  value  of  the  game  is 

L’  =  !(«**,  u^’)  =  x'oZ{0)xo.  (3.29) 
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(iv)  If  the  Riccati  equation  ( 3. 2i)  has  a  conjugate  point  in  the  open  interval 
(0,</).  then  the  upper  value  of  the  game  is  unbounded. 

In  this  study,  we  would  consider  the  system  described  by  eq.  (3.18),  assuming  that 
the  interchangeability  condition  is  satisfied.  Thus,  we  have  the  following; 

ASSUMPTION 

min  max  J  =  max  min  J. 

(t  <ii  <t> 

As  stated  ii.  the  theorem  above,  the  assumption  can  be  checked  by  the  con¬ 
jugate  points  of  the  Riccati  differential  equation  ( 3.24).  If  this  assumption  is 
satisfied,  the  game  saddle  point  and  the  calculus  saddle  point  are  equivalent  [12]. 

3.4.  Algorithm 

Based  on  the  discussions  in  the  previous  sections,  several  algorithms  will  be  pre¬ 
sented  in  this  section.  The  basic  idea  is  for  one  player  to  repeat  phase  2  and 

pheise  3  of  the  general  configuration  described  in  the  first  section  while  assuming 

1 

no 'control  over  his  opponent’s  strategy.  For  the  other  player,  his  strategy  can 
be  generated  by  some  external  generator.  This  process  results  in  the  one-player 
paradigm  algorithm  which  will  be  described  below.  In  practice,  the  other  player 
maylimplement  the  same  process  of  the  phase  2  and  phase  3  as  well,  which  re¬ 
sults  in  the  two-player  paradigm  algorithm.  The  two-player  paradigm  algorithm 
of  differential  games  with  neural  networks  will  be  discussed  later  in  this  section. 
The  details  of  each  algorithm  will  be  given.  One  should  know  that  the  basic 
frame  of  each  algorithm  remains  the  same,  namely,  to  repeat  stage  I  and  stage 
II  for  one  particular  player  or  all  players.  Li  the  one-player  paradigm  algorithm. 
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phases  2  and  3  of  stage  I  for  one  player  are  repeated  and  we  assumed  no  control 
over  the  strategy  for  the  other  player.  In  the  two-player  paradigm  algorithm,  we 
implement  the  same  process  of  both  stage  1  and  stage  11  for  each  player.  Finally, 
in  algorithm  3.  we  implement  tlie  process  in  sta^o  II  *>ase,i  on  the  idea  discussed 
in  the  last  Section  3.2.  Figure  $.')  'lutws  t'ne  'une  -'ep  f^r  ^he  implementation. 


l:cure  I  *.  I  .rr.e  >>.•'>>  [-  ,r  Or.e  i’’,a;.er 


In  what  follows,  the  p!a>er  to  a tr.e  .ilg.  rt’.hr:.'  are  appiied  is  called  “own 
player",  and  his  counterpart  is  "tine  other  player*. 


Algorithm  1  (One-player  Paradigm): 

Stage  I:  (1)  Using  the  environmental  information,  identify  the  other  player’s 
control  strategy. 

(2)  Based  on  the  estimate  of  the  other  player’s  movement  or  strategy, 
generate  a  new  control  value  for  own  player  by  using  the  optimal  control 
law. 

Stage  II:  Generate  a  new  control  value  for  the  other  player. 

REMARKS: 

1.  The  environmental  information  in  stage  I  is  usually  numerical  values  of 
observable  state  variables,  but  that  information  could  be  curves  generated 
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by  state  variables.  One  way  to  obtain  the  environmental  information  is  to 
first  obtain  the  characteristics  of  the  segment  of  curves,  e.g.  the  direction 
information  and  the  sharpness  of  the  turning  corner,  and  then  match  any 
of  this  information  with  that  stored  in  the  database.  The  strategy  assigned 
to  any  matched  segment  of  curves  will  be  said  to  be  the  strategy  employed 
to  generate  that  segment.  Details  of  the  method  will  not  be  given  here.  In 
the  remainder  of  this  dissertation,  we  will  use  the  observable  variables  as 
the  environmental  information  in  stage  I. 

2.  The  control  value  in  stage  II  could  be  generated  by  any  control  law  (or 
maneuver)  since  the  own  player  has  no  control  over  the  opponent  player. 
In  this  algorithm,  changing  control  law  on-line  is  allowed  for  the  opponent 
player.  The  own  player  has  the  capability,  realized  by  an  identifier  in  stage  I, 
of  keeping  track  of  the  changes  of  control  strategies  of  the  opponent  player. 

3.  In  practice,  the  difference  in  the  length  of  time  intervals  for  stage  I  and 
stage  II  could  be  significant  since  an  optimal  control  law  is  used  in  stage  I 
and  hence  the  time  required  to  compute  the  optimal  control  could  be  much 
more  than  that  in  stage  II. 

4.  In  our  simulations,  we  realized  the  identifier  and  controller  using  neural  net¬ 
works.  There  are  many  publications  in  the  literature  discus.sing  the  problems 
of  neural  controllers  (e.g.,  [4'.,,  95]).  This  type  of  controller  is  particularly 
useful  for  difficult  control  problems  [3]. 

In  what  follows,  we  give  a  slightly  different  algorithm  called  a  two-player  paradigm 
algorithm.  By  assuming  that  two  players  use  the  same  process  of  identification 
and  control,  we  can  construct,  in  Figure  3.6.  a  different  paradigm  of  differential 
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In  the  following  two  algorithms,  we  assume  that  two  players  are  engaged  in 
the  entire  encounter.  For  simplicity  and  clarity,  we  denote  “Player  .A”  for  one 
player  and  “Player  B"’  for  the  other. 

Algorithm  2(Two-player  Paradigm): 

Stage  I:  (1)  1  Jsing  environmental  information,  identify  player  13's  strategy. 

(2)  Based  on  the  estimate  of  player  B's  movement  or  strategy,  generate 
a  new  control  law  for  player  A  by  using  optimal  control  theory. 

Stage  II:  (1)  Using  environmental  information,  identify  player  .A’s  strategy. 

(2)  Based  on  the  estimate  of  player  A’s  strategy,  generate  a  new  con¬ 
trol  law  for  player  B  using  optimal  control  theory. 


Last,  we  give  an  algorithm  based  on  the  discussions  of  the  last  section.  In 
this  case,  the  optimal  control  in  stage  II  could  be  very  complicated  and  highly 
nonlinear  even  though  the  original  system  is  linear.  This  will  be  illustrated  in  a 
pursuit^evasion  game  problem  in  the  next  section. 

Suppose  that  the  system  for  two  players  is  described  by 

X  =  (3-30) 

where  is  a  vector  of  differentiable  functions,  x  is  the  state  vector  of 

dimension  n  (n  is  a  positive  integer),  4>  is  the  control  strategy  for  player  A  and  v 
is  the  control  strategy  for  player  B. 

The  goal  for  both  players  is  to  maximize  (or  minimize)  a  cost  function  given 

maxmin  J(<^,  t(’)  (3.31) 

Again,  we  use  to  denote  the  optimal  value  for  J{4>,xl>)  if  0,  the  estimate  of 
player  B’s  strategy,  is  known,  that  is 

(3.32) 

With  these  notations,  we  have  the  following:  ^ 

1 

I 

I 

Algorithm  3  (Optimal  Control  Paradigm): 

Stage  I;  (1)  Using  the  environmental  informition,  identify  player  B’s  strategy. 

(2)  Based  on  the  estimate  of  player Ib’s  strategy,  generate  a  control 
strategy  by  using  optimal  control  thepry. 

Stage  II:  From  the  strategy  used  in  stage  I,  construct  a  new  problem 

ma.x  J,i,  {or  minJ,*), 

V  0 

subject  to  X  =  /(x,  <j>,  0), 
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3.5.  A  Pursuit-Evasion  Game  Problem 

3.5.1.  The  Problem 

In  a  pursuit-evasion  game[12],  the  pursuer's  control  is  his  acceleration,  ap{t),  nor¬ 
mal  to  the  initial  line  of  sight  (ILOS)  to  the  evader.  The  evader’s  control  is  his 
acceleration,  ae{t),  also  normal  to  the  ILOS.  The  relative  velocity  along  the  ILOS 
is  such  that  the  normal  time  of  closest  approach  is  t j.  If  v(t)  is  the  relative  veloc¬ 
ity  normal  to  the  ILOS,  and  y(t)  is  the  relative  displacement  normal  to  the  ILOS, 


the  equations  of  motion  are 

w  =.  Op-ag,  v(-'»)  =  i;o;  .  (3.33) 

y  =  V,  y{to)  =  0.  (3.34) 

The  pursuer  wishes  to  minimize  the  terminal  miss,  |j/(</)|,  whereas  the  evader 
wishes  to  maximize  it,  so  the  performance  index  may  be  taken  as 

J  =  (3.35) 

The  accelerations  of  the  pursuer  and  the  evader  are  limited 

|®p1  ^  flpm, 

k|  <  Oem.  (3..36) 

where  Upm.  >  Oem-  The  solution  proceeds  by  first  forming  the  Hamiltonian 

H  =  A„(ap  —  Oe)  +  AyU.  (3.37) 

The  adjoint  equations  are  then 

A„  =  -Ay,  A„(</)  =  0,  (3.38) 

Ay  =  0,  Ay(t/)  =  y(f/),  (3.39) 


and  the  optimality  conditions  are 

*^p(0  ~  ^pmSgnAy, 

ae{t)  =  -QemSgnAy. 

The  adjoint  equations  are  easily  integrated  to  yield 

A„(<)  =  itj~t)y{tj), 
Ki^)  =  i/(</)  =  const. 

It  is,  therefore,  clear  that 


(3.40) 

(3.41) 

(.3.42) 

(3.43) 


sgn  A„(<)  =  sgn  y(ty)  =  const.  (3-44) 

Subst-‘a*ing  eq.  (3.44)  into  eq.  (3.40)  and  eq.(3.-tl),  and  eqs.(3.40),  (3.41)  into 
eq.  (3.33)  and  eq.  (3.34)  yields  a  simple  set  of  differential  equations  whose  solution 
may  be  written  as 

yitf)  =  vo{tf  -  <o)  -  ^(apm  -  a,m){t/  -  to)’sgn  y(«/),  (3.45) 


which  can  be  used  for  determining  y(t/).  Thus,  we  have 

i  2(^'  ~  ”  ("p"*  ~  “«'")]  (f^-toKa^m-a.m)  ^  ^ 

[  -Wf  ~  ~  (“p”*  “  (t/-«o)K„-a,„) 


yih) 


<  -1. 


Substituting  the  above  equation  into  eq.  (3.40)  yields  the  control  for  the  pursuer 

e 

as  follows 


where 


*pm 


ap(0  =  { 


if  fli  >  1, 
if  ai  <  —1, 

i  +  ‘f-1  <  “1  <  1. 


*pm 


(3.46) 


Cl  = 


2uo 


{tf  —  to){apm  —  Oem)’ 

2y(<o) 


A  =  —v(to)  — 


tf  —  to 
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3.5.2.  Solution  By  Using  Neural  Networks 

From  the  last  section,  we  know  that  the  differential  game  solution,  for  ap  is 

ap(<)  =  -apmSgn[y{tj)],.  (3:47) 

The  one-player  paradigm  algorithm  will  be  used  for  our  study  in  which  ag  is  either 
a  fixed  value  or  a  continuous  function  of  time  (see  Figure  3.7). 


ae 


Figure  3.7:  A  Typical  Function  for  a* 


In  what  follows,  we  assume  that  ae{t)  is  an  independent  variable.  We  shall 
show  that  the  control  strategy  ap{t)  for  the  pursuer  is  actually  a  function  of  ag{t). 
From  equation  (3.47),  one  can  readily  see  that  ap(t)  is  constant  over  the  interval 
The  sign  of  ap{t)  depends  on  the  sign  of  y{tj).  Therefore,  the  explicit 
expression  for  j/(<)  is  necessary. 

From  eq.  (3.33),  we  know 


v{t)  =  v{to)  -b  /  ap{T)dT  -  /  ag{T)dT, 
Jto  Jto 


(3.48) 
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which  yields  from  eq.  (3.47) 

v{t)  =  u(<o)  -  rtpmSgn[j/(</)j(<  -to)  -  f  at{T)dT.  (3.49) 

Jto 

Letting 


we  have 


u(0  =  u(<o)  -  apmSgn[y(</)](t  -  to)  -  ae(/).  (3.50) 


It  follows  from  eqs.(3.34)  and  (3.50)  that 


y{i)  =  /  v(r)dr 

Jto 

-  I  (t^(<o)  -  apmSgn[y(«/)](r  -  to)  -  ae(i')](iT 

Jto 

=  v{to){t  -  (o)  -  ^sgn(y(<;)](t  -  to)*  -  ^  ae(T)dr.  (3.51) 


(3.52) 


It  follows  from  eqs.  (3.51)  and  (3.52)  that  y(t)  depends  on  the  time  function  a.(t) 
(to  <  t  <  t/),  which  further  implies  from  eq.  (3.47)  that  ap(t)  (to  <  t  <  t/) 
depends  on  the  function  a«(t).  This  observation  further  justifies  the  statement  of 
phase  3  in  the  general  configuration  in  Section  3.1. 
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Next,  let  us  consider  the  optimization  problem 


min  J(/o,ap,ae), 

dp 

(3.53) 

subject  to 

V  =  CpH)  -  Oe, 

o 

II 

(.3..54) 

II 

6 

II 

(3..55) 

Its  solution  is  denoted  by  ap(t),  which  is  a  functional  of  ds(f)-  Please  note  that  in 
this  optimization  problem  de(<)  is  the  estimate  of  ag(i). 

We  further  consider  the  optimization  problem 


max  J(<o,  a^Oe) 

at  ' 

(3..56) 

subject  to 

V  =  «p(0  -  Oe(t), 

y(<o)  =  Vo, 

(3.571 

y  =  y, 

o 

II 

o 

(3.58) 

If  we  can  denote  sgn[y(f/)]  by  Sa,(0  which  apparently  means  a  functional  of  ae(t), 
it  is  interesting  to  see  that 

(‘lit)  =  -a^mSaAt),  (3.59) 

and  thus  the  system  equations  for  the  optimization  problem  (3.56),  (3.57)  and 
(3.5S)  becomes 

—  "“flpm‘5'ae(f )  ~  ^e(0 

=  /(a«(0),  (3.60) 

and 


y  =  v{t). 


(3.61) 
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Therefore,  we  know  from  eq.  (3.60)  that  the  optimization  problem  (3.56),  (3.57) 
and  (3.58)  now  becomes  a  nonlinear  optimization  problem  even  though  the  original 
one  is  linear. 

Solving  the  above  nonlinear  optimization  problem  for  an  arbitrary  function 
ag{t)  is  usually  very  diffic.lt.  In  our  case,  we  only  consider  the  following  function 

ag{t)  =  -agSgn[y{tj)],  (3.62) 

where  a*  is  assumed  to  have  an  arbitrary  value  between  zero  and  agm- 

Substituting  eq.  (3.62)  into  eq.  (3.52),  we  have 

ae(0  =  -  /  /  aeSgn[yitj)]dpdT 

JtQ  Jto 

=  -«eSgn[y(</)]^(<  -  ^o)^  (3.63) 

which,  together  with  eq.  (3.51),  gives 

y(0  =  f(<o)(<  -  to)  -  ^apmSgn[y(</)](J  -  <o)^  +  aeSgn[y(f/))^(«/  -  to)*. 

(3.64) 

By  comparing  eq.  (3.45)  with  eq.  (3.64)  ,  we  can  clearly  see  the  reason  why  we 
have  chosen  this  special  form  (3.62).  The  only  difference  between  eq.  (3.45)  and 
eq.  (3.64)  is  that  the  maximal  value  of  ag(t)  is  used  in  eq.  (3.45).  .A.n  immediate 
benefit  from  this  difference  is  that  the  formula  of  how  to  determine  the  sign  of 
y(t)  has  the  same  form  as  that  of  the  last  section.  Thus,  by  replacing  Ogm,  by  Og, 
we  can  use  the  eq.  (3.45)  to  determine  the  sign  of  y(</).  Although  we  have  such  a 
nice  representation  when  the  special  form  (3.62)  is  used,  we  do  not  have  to  confine 
ourselves  to  it.  In  fact,  any  type  of  integrable  function  may  be  used  for  ag{t). 
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3.5.3.  Neural  Identifier 

From  previous  subsections,  it  is  known  that  a'p(t)  is  a  functional  of  aj(t).  Thus, 
identification  of  a*  is  essential  for  our  work  when  a*  is  unknown.  In  this  subsection, 
the  identification  is  accomplished  by  a  Neural  Identifier  which  is  realized  by  a 
trained  neural  network. 

In  our  problem,  a  neural  identifier  is  a  generic  mapping  between  two  domains. 
The  first  one  is  the  Environmental  Information  Domain  (EID)  which  is  usually 
accessible.  For  our  problem,  the  other  domain  is  the  one-dimensional  space  R 
if  the  system  which  we  are  considering  is  a  single-input  control  system.  The 
environmental  information  from  EID  is  usually  the  numerical  values  of  observable 
state  variables  or  curves  of  state  variables.  The  mapping  is  so  defined  that  for  each 
piece  of  information  from  EID  there  is  one  and  only  one  point  in  R  which  uniquely 
determines  that  piece  of  information.  The  function  of  the  Neural  Identifier  is  thus 
to  determine  the  point  in  R  which  yields  that  piece  of  information.  The  way  to 
determine  the  point  in  R  yielding  the  environmental  Information  is  essentially 
the  method  of  training  the  network.  We  shall  discuss  the  Phase-plane  training 
method,  which  will  be  used  in  our  simulations. 

Phase-plane  Method 

The  function  of  the  Neural  Identifier  is  to  estimate  the  strategy  for  the  evader. 
ae(t),  baised  cn  the  environmental  information.  In  view  of  the  fact  that  a  player 
performing  identification  cannot  make  any  control  decision,  we  may  assume  that 
Op  =  0  (or  any  other  real  number)  during  the  Identification  Process.  Then,  we 
have  the  following  simplified  system  equations 

V  =  -u„  (3.65) 
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A 

ae 


Denoting 


^y(<o)  =  y(0-J/(<o), 

ilvyto)  =  v{t)-v{to), 
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A 

ae 


v(to)  Av(to)  AyCtp) 


Figure  3.9:  Pheise-plane  Method  for  Identification 


As  mentioned  before,  typical  types  of  environmental  information  are  the  nu¬ 
merical  values  of  the  observable  state  variables  or  the  curves  generated  by  them. 
.A  trajectory  of  phase  plane  for  positive  values  of  is  shown  in  Figure  3.10.  Ev¬ 
ery  time  a  pursuer  performs  the  identification  using  the  Neural  Identifier  only 
the  increments  of  state  variables  y(t)  and  v(t)  at  the  time  when  he  begins  the 
identification  and  the  value  of  v(t)  at  that  time  are  fed  into  the  network.  The 
output  of  the  network  will  be  the  estimate  of  -it. 


Ba 
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3.5.4,  Training  the  Network 

A  diagram  of  the  Neural  Identifier  is  depicted  in  Figure  3.11.  As  mentioned  above, 
we  will  use  the  Phase-plane  method  to  generate  the  training  set  for  the  Neural 
Identifier.  For  our  convenience,  we  rewrite  the  formula  as  follows 

+  (3.72) 

where  the  quantities  A0tg,Vtg  and  are  used  in  place  of  Avta,vta  and  for 
the  reaison  below. 

The  reason  why  we  would  use  Vig,Av,g  and  Aptg  in  place  of  Vtg,  Avtg  and  Aj/fg 
for  identifying  the  strategy  a*  is  that  the  information  coming  from  any  practical 


So 


Figure  3.11:  Block  Diagram  for  Neural  Identifier 


system  is  always  corrupted  by  some  type  of  noise  although  the  noise  might  be 
small.  This  substitution  is  illustrated  in  Figure  3.12  where  the  observable  state 
variables  are  converted  by  some  device,  called  device  “I”,  into  some  variables 
suitable  for  the  purpose  of  identification. 

At 


Noiie  Noi>e  Noiie 


Figure  3.12:  Structure  for  Generating  the  Training  Set 


In  the  ideal  case  in  which  the  noise  is  zero.  i.e.  I’lg  =  I’e,,.  Afe,,  =  Ar(g, 
Ay.j  =  ^1/(0 1  we  shall  have  from  cq.  (3.72)  that 

„  ,,  2fle 

— - +  2t’(,)(  — -7 — rr-T - 7“T — )) 

2  Ai;,,(2Au,,  +  Au,, 

=  a*.  (3.73) 


Thus,  Ot  becomes  a,  in  the  ideal  case.  The  above  discussion  implies  that  the 
Neural  Identifier  functions  as  an  inverse  system  of  the  original  system.  Therefore, 
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it  follows  that  we  need- to  use  the  Phase-plane  Method  to  generate  the  training 
set  for  the  Neural  Identifier  with  noise-corrupted  inputs.  The  necessity  of  this 
assumption  will  be  shown  in  the  following  discussion.  We  know  that  there  will 
always  be  some  errors,  no  matter  how  small  they  are.  between  the  ideal  mapping 
of  equation  ( 3.71)  and  the  one  approximated  by  the  Neural  Identifier.  These 
errors  can  be  viewed  as  some  sort  of  noise  added  to  the  system  in  the  assumption 
that  the  Neural  Identifier  is  an  ideal  mapping  of  equation (3.71).  Thus,  we  may 
assume  that  the  Neural  Identifier  is  an  ideal  mapping  for  the  equation  ( 3.71), 
and  the  errors  between  the  ideal  mapping  of  the  equation  (3.71)  and  the  actual 
identifier  is  due  to  inputing  imperfect  information  (corrupted  by  noise).  This 
idea  provides  a  basis  for  the  method  of  generating  the  training  set  for  the  Neural 
Identifier.  We  simulated  the  noises  by  generating  some  random  numbers  with  a 
given  variation.  The  random  numbers  are  assigned  to  Uj,,  and  thus  Ut,  is  obtained. 
Note  that  Aut,,  =  .  .Vt,  =  -1  since  a*  =  1  and  Op  =  0  during  the  Identification 
Process  and  Ayt,  is  the  output  of  the  system  driven  by  Oj  (see  Figure  3.13).  The 
values  of  vt^,Avt^  and  Aj/f,  are  put  in  each  input  line  of  the  training  set  and  the 
value  computed  using  formula  (3.71)  is  the  desired  output  of  the  Neural  Identifier, 
corresponding  to  those  particular  Vto,At’,5  and  Ayt^.  Repeating  these  input  lines 
and  desired  output  lines  forms  a  training  set  for  the  Neural  Identifier.  Please  note 
that,  instead  of  trying  to  approximate  a  mapping  of  eq.  (3.71 )  for  arbitrary  inputs, 
we  approximate  the  mapping  with  some  specific  domains  which  are  the  outputs 
of  the  original  system  driven  by  Oe. 

The  structure  and  parameters  of  the  Neural  Identifier  are  summarized  as  fol¬ 
lows: 

Type  of  neural  network:  Hetero-Association, 

Control  strategy:  Backpropagation, 
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Learning  rule:  Delta-rule, 

Transfer  function:  Sigmoid, 

Scale:  1.0, 

Summation:  Sum, 

Learning  rate:  Coel  =  0.9,  Coe2  =  0.6. 


random 

number  noise 


Figure  3.13:  Generating  Data  for  Neural  Identifier 


3.5.5.  Simulation  Results 


The  simulation  works  are  done  in  a  Sun  4/260  workstation  under  the  environment 
of  UNIX.  All  programs  are  written  in  C  and  the  outputs  are  numerical  values.  The 
parameters  of  the  system  used  in  the  simulation  are  summarized  in  Table  3.1.  In 
our  simulations,  the  value  of  Og  is  fixed  and  the  pursuer  is  to  identify  the  value  Cg. 
.As  shown  in  Figure  3.14,  the  simulation  is  carried  out  in  several  different  periods 


(that  is,  period  1,  period  2,  etc.).  In  each  period,  the  index  is  lincremented  from 
one  to  the  maximum  value  state J .limit.  The  goal  for  the  pursuer  is  to  compute 
control  values  beised  on  the  estimate  of  Og  such  that  the  value  of  y  at  the  end  point 
of  each  period  is  minimized.  In  each  individual  period,  the  first  interval  (denoted 
by  I  in  Figure  3.14)  is  for  the  pursuer  to  obtain  the  estimate  of  a*  and,  in  our 
simulation,  —  0.  The  maxiihum  number  of  periods  in  entire  simulation  is  the 
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(a)  no  disturbance  (solid  line). 

(b)  the  case  [iv]  (dash  line), 

(c)  same  as  case  [iv]  except  that  -10.0  for  V(o 


and  -blO.O  for  Ayt^  (dashdot  line), 


(c)  time  (d)  time 

Figure  3.15:  Simulation  Plots  for  Differential  Games  with  Neural  Networks 


The  output  data  for  these  different  cases  are  shown  in  Table  3.2.  In  Table  3.2, 
d*  denotes  the  estimate  of  a,  and  a*  is  the  optimal  control  value  for  the  pursuer. 
Using  the  parameters  in  Table  3.1,  we  know  that  at  steps  0,  5.  10,  15.  20  the 
pursuer  gets  the  estimate  of  Oe.  Then,  in  the  first  period  (step  1  through  step 
5),  in  the  second  period  9step  6  through  step  10),  in  the  third  period  (step  11 
through  step  15),  and  in  the  fourth  period  9step  16  through  step  20),  the  pursuer 
computes  the  optimal  control  value  a’  fro  each  of  these  periods.  The  values  of  the 
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states  y  and  v  are  also  gievn  in  Table  3.2.  Shown  in  Figure  3.15  are  the  curvc-s 
plotted  using  Mat-lab  software.  The  codes  for  the  five  cases  are  similar  e.xcept  for 
minor  differences. 


Table  3.2:  Output  Data  for  Simulation 
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given  (see  Simu.back).  As  shown  in  Figure  3.15.  there  always  exists  a  jump  in  y 
between  the  beginning  and  the  end  of  each  “Identification  Process”  (please  notice 
that  the  values  of  y(5)  and  y(6),  the  values  of  y(10)  and  y(ll).  and  the  values  of 
y(15)  and  y(16),  in  each  of  the  first  four  cases).  This  is  because  we  set  =  0 
during  the  ‘‘Identification  Process"’,  therefore,  the  system  runs  freely.  One  may 
argue  that  the  magnitude  of  the  jump  in  y  in  the  “Identification  Process’’  can  be 
reduced  if  we  keep  Up  at  the  previous  value,  instead  of  zero.  But  it  turns  out  not 
to  be  true  (see  Figure  3.17). 


Figure  3.17:  Programs  Simu.back.c  Without  Previous  Cp  and  That  With  Previous  Op 


Different  simulation  results  will  be  compared  here  (see  Figure  3.15  and  Fig¬ 
ure  3. IS).  The  result  for  program  Simu3.c  exhibits  the  best  performance  among 
the  experiments  done  in  the  sense  that  the  value  ll/(P/)|  has  achieved  the  mini¬ 
mum  (see  y(5),  y(10),  y(15)  and  y(20)  respectively  from  the  table).  The  results 
are  reaisonable  because  in  program  Simu3  only  the  quantity  is  disturbed  by 
some  noise  while  in  program  Simu4.c  both  quantities  of  Aut,  and  Aytg  are  dis¬ 
turbed  by  noises,  and  additional  constant  disturbances  are  added  to  Vt^  and  Ayto 
in  program  Simu3.d.c.  The  results  are  also  explained  partly  by  the  accuracy  of 
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the  estimate  of  a*,  i.e.  c,.  Since  the  program  Simu3.c  has  the  highest  accuracy, 
it  has  a  better  performance  th2m  the  others. 


Figure  3.18:  Comparison  of  Simulation  Results 


The  simulation  results  show  the  feasibility  of  this  approach.  One  may  like 
to  apply  the  algorithms  presented  in  this  work  in  some  more  complicated  cases: 
highly  nonlinear,  corrupted  by  noises  or  even  with  uncertainty.  What  should  be 
kept  in  mind  is  that  both  sides  of  the  game  can  implement  the  same  process. 
Therefore,  in  some  sense,  whoever  has  the  better  information  about  what  his 
opponent  is  doing  or  a  more  accurate  estimate  of  his  opponent’s  strategy  would 
eventually  win  the  game.  Using  the  principle  outlined  in  the  general  configuration 
of  this  chapter,  we  can  also  implement  the  other  algorithms  presented  in  this 
chapter,  e.g.  ■‘Two-player  .Mgorit’'m”’,  in  the  same  or  even  more  complicated 
systems.  When  carrying  out  this  further  research,  we  should  always  keep  in  mind 
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that  the  approach  presented  here  is  no  longer  the  same  as  traditional  approaches 
to  differential  game  problems.  It  deals  with  a  broader  range  of  problems  because 
it  allows  both  sides  of  the  players  to  take  ANY  strategy.  For  this  reason,  it  can  be 
expected  that  more  properties  exist  about  the  optimal  trajectories,  controls  and 
feasible  sets,  which  may  also  be  a  direction  for  further  research. 

3.6.  Learning  Algorithm  of  Feedback  Control 

In  the  previous  sections,  the  approach  to  differential  games  with  neural  networks, 
different  paradigms  and  a  case  study  example  have  been  thoroughly  discussed. 
The  approach  is  based  on  the  paradigm  of  semantic  control[76].  Two  neural 
networks,  called  Neural  Identifier  and  Neural  Controller,  were  used  in  each  of  these 
paradigms.  The  neural  identifier  identifies  the  control  strategy  of  the  opponent 
player  based  on  the  environmental  information.  The  neural  controller,  on  the 
other  hand,  gives  the  control  strategy  for  the  own  player.  The  subsequent  sections 

mainly  discuss  how  to  construct  and  to  train  the  neural  controller.  A  rigorous 

! 

mathematical  derivation  for  weight  updating  rule  of  the  neural  controller  will  be 

I 

given. 

In  [7],  a  feedback  control  law  in  the  class  of  L-layer  neural  networks  is  given 
to  control  the  discrete-time  system  such  that  viability  conditions  are  satisfied. 
The  viability  conditions  are  described  by  a  subset  K  of  the  state  space  such  that 
dh'{Xn+i)  =  0  for  all  n  >  0,  where  dK(-r/-.+i)  is  the  distance  between  the  state 
and  the  subset  K.  The  strategy  underlying  the  external  algorithm  is  to  apply  the 
gradient  method  for  the  minimization  of  d^.  The  network  learns  or  will  learn 
whenever  the  states  lie  outside  of  the  domain  K. 
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An  extension  to  the  differential  game  problems  is  considered.  Based  on  the 
one-player  paradigm  described  in  Section  3.2,  a  differential  game  problem  can  be 
converted  to  an  optimization  problem  provided  that  the  estimate  of  the  control 
strategy  for  the  enemy  has  been  obtained.  A  similar  approach  to  that  in  [7] 
will  be  used  for  the  differential  game  problems  discussed  in  the  previous  sections. 
However,  instead  of  minimizing  dfc,  minimizing  an  arbitrary  cost  function  J  is 
considered.  A  detailed  formula  for  updating  the  weights  for  three-layer  networks 
will  be  given. 

3.6.1.  The  Updating  Rule 

Let  us  now  consider  a  discrete-time  version  of  the  differential  game  problem, 
defined  as  follows 

rnaxmin  J„+i(Xn+i,T/'n,dn),  (3.74) 

subject  to  the  difference  equation 

Xn+I  =  /{Xn,^n,d>n),  Xo  is  given,  (3.75) 

W  here  Oq  —  (dl,n,*’’i  d/,,^),  —  (t^’l.n***'.  ^*Tn,n ),  Xn  —  d.n  )^ 

€  ^,  dn  €  X„  €  X,  X  is  the  state  space,  'k  and  $  are  the  feasible  subspaces 
in  the  control  space  Z  for  ihe  control  strategies  t’n  and  On  respectively,  and  f  is  a 
C'  mapping  from  X  x  Z  to  X. 

We  can  define  a  quantity  belonging  to  a  subspace  of  the  value  space 

of  J,  as  follows 

~  •Ai+i(Xn+l,  V’d)  <^n)t  (3.76) 

A 

where  0n  is  the  estimate  value  for  il'a. 
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Having  obtained  we  can  state  the  original  problem  as  follows 

*^n+l  011  —  niin  c/n+l  (Xn+l ,  t^ni  (3.n) 

0n  ®ii 

subject  to 


Xn+l  —  /(Xq,  0ni  ^n)'  (3.78) 


where  On  €  We  are  now  trying  to  find  o„  such  that  0^(xn+i,  On)  is  max¬ 
imized.  A  feedback  control  law  is  obtained  through  an  L-layer  neural  network 


4>n  =  <!>£,(  Wi(n),Vr2(n),...,H^i(n),Xn), 


(3.79) 


where  $l(  Wi(n),  iy2(’^)i  •  •  •  >  W^(f*)iXn)  denotes  the  propagation  of  a  signal  x  in 
a  neural  network,  and  Wk{n)  is  the  synaptic  matrix  associated  with  the  network 
layers  k-1  and  k  at  step  n.  Let  us  denote  the  layer  i  by  P'  and  the  number  of  nodes 
for  layer  P*  by  n,,  i=0,  L.  For  a  single  input  control  system,  the  number  of 
the  output  for  the  neural  network  should  be  one,  that  i«,  n/,  =  1.  Using  the  same 
notation,  we  can  easily  see  that  fU,(n)  €  i=:l,  L.  For  compactness, 

denote  the  weight  matrix  by  VF(n)  =  (VUi(  t),  H'2(ri), H^£,(n)).  Each  VV",,(n)  can 
also  be  written  as  W,,{n)  =  (u^ij(«))n,xn,_ii  '?=!»  •••.  L.  Moreover,  let  us  define 
the  output  of  the  ith  neuron  in  the  jth  layer  by  if.  Thus,  the  output  of  the  first 
neuron  in  the  last  layer  should  be  denoted  by  if'.  The  transfer  function  for  the 
ith  neuron  in  the  jth  layer  is  denoted  by  <7j_i(.).  Thus,  if  we  consider  a  single¬ 
output  network,  the  transfer  function  for  the  single-output  neuron  is  5f,_i(.).  or 
for  short,  gi-ii-)-  For  our  convenience,  we  also  use  the  ;ompact  form  gii-)  to 

denote  (5[(. ),•••, 52M-))- 


Having  these  notations,  we  can  define  a  gradient  learning  rule,  which  is  actually 
the  back-propagation  learning  rule, 


(3.80) 
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where  ajj  is  the  gradient  step  factor.  For  this  standard  formula,  computational 


burden  is  mainly  the  calculation  of  whose  computational  load  will  increase 

dramatically  as  L  becomes  larger. 


For  simplicity,  we  only  consider  the  case  in  which  the  system  (3.75)  is  a  single- 
input  system.  In  this  case,  rii  =  1.  Note  that 


dJ. 


n+t.t^n 


=  E 


dh 


and 


dfk  dfk  d^L 


(.3.81) 


(3.82) 


5u>fj(n)  d<i>i,ndw^^' 

Therefore,  the  key  step  is  to  compute  which  contributes  the  most  heavy 

computational  load.  VVe  only  consider  the  case  for  three  layer  and  single  output 
network  since  it  is  the  most  common  case  encountered. 


At  step  n,  the  outputs  of  neurons  in  each  layer  are  given  by 


-  ^l.TH 


Xd  =  Xd,n, 


2^!  = 

k=l 


*=1 


*=l 


*»! 


k=l 

x\  =  9l{f^^ik{^)xl). 


9S 


For  simplicity,  we  will  ignore  the  dependence  of  i  =  0 . L,j=  1.  ...  n,.  on  the 

integer  a.  In  what  follows,  we  will  derive  the  formulas  for  computing  quantities 
Again,  for  simplic’*v,  we  drop  the  r  >tion  n  in  ivuin)  and  write  if[’j(n)  as 


For  ;;=L.  we  have  4'l  =  )'  "^here  gi-i  is 

the  transfer  function  for  the  output  layer,  which  is  usually  a  sigmoidal  function, 
and  =  (jf"* . Elementary  calculation  yields 


-  Sl-li  ^Xk^k 
^^ll  fc=l 


For  r?=L-l,  we  similarly  have 


(3.83) 


(3.84) 


=  9L-2m-xX^-^) 


Thus, 


si  jEr.‘,- 


9L-2  (L,  =  l  ) 


a'~ir7  -  ?r,-i(  Z- 


(.3.86) 


where  „  V.r  can  be  written  as 


axf-‘ 


ri  —  9L-2\  )^i 


(3.87) 
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For  rf—L-2,  we  similarly  have  the  following  set  of  equations 


=  9L-t(  ^  ti-f.-rf"*). 


1"t-l I 

9L-i  (I..=l  ) 


(3.89) 


= 


-ni-i  J 


9l—3  \^i=1  1 


(3.90) 


o-rrj  =  9l-i(T.'^u4 

a-L-i  "t-J 

*^■^1  _  -i'  /  V'  ...t'-J  rt--3l^t,-3 

9L~3y  2-.  *^<p  ■^p 


(3.91) 


(3.92) 


(3.93) 


Based  on  the  above  formulas  (3.83)  -  (3.93),  we  can  compute  the  derivatives  of 
with  respect  to  le,^^  for  each  triple  (?/,  i,  j).  Below,  we  consider  the  following 
special  case. 


gw 


Special  Case: 

A  commonly  found  case  for  two-player  differential  game  problems  is  that  the  cost 
function  J  is  the  function  of  the  final  state  x\.  VVe  have  only  considered  the  case  of 
one  control  variable  for  both  players.  Thus,  J  =  Ji(xn.  d’li  •  •  • .  t/'N-  Oi.  •  •  • .  On) 
(refer  to  the  pursuit-evasion  game  problem  in  Section  3.5).  Both  players  try  to 
ma.ximize/minimize  the  cost  ♦'unction  J  =  Ji(xn,  U>i,  -  ■  • ,  C>i.  •  •  • ,  On)  subject 
to  a  set  of  difference  equations: 

max  min  7  =  7i(xn,  V’l.  •  ■  •  i  VN>  ‘  >  <^n)i 

Xn  =  /(Xn-1>  d*n-l»  n=l,  2,  ...  N, 

where  d;  G  $  and  il'\  G  'J',  i  =  1,  2 . In  this  case,  the  neural  control  problem 

for  the  case  where  f,  is  known  is  formalized  as  follows: 

,  min  Ji(xN,0i,---,0N,d>i,---,d>N)f 

Xn  =  /(Xn-l,(J^n-l,0n-l),  n=l,  2,  ...  N, 
and  the  control  sequence  (d>i,  <ii2,  •  •  • ,  's  given  by  •  •  •  ,'Wl,  Xj),  i  = 

1 . 2,  •  •  • ,  iV,  and  the  updating  rule  takes  the  form 

<(^)  =  <(iV-l)-af_,^(x,v,t^i,---,0N-W),  (3.94) 

t)=1.‘2 . L, 

i=l,  2,  ....  n„, 
j=U  2 . n„_i, 

which  means  that  updating  the  weights  happens  at  the  (N-l)th  step. 

3.6.2.  Implementation 


In  order  to  use  above  updating  rule  in  real  time,  an  on-line  scheme  has  to  be 
considered.  A  consideration  is  shown  in  Figure  3.19. 
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to  ti  *2  *3 


I:  Identification  Process 
C:  Control  Process 

Figure  3.19:  Identification-Plus-Control  Process 

.An  identification-plus-control  period  is  from  to  to  ts,  which  has  three  subinter¬ 
vals:  [to,  ti](identification),  [ti,t2](updating  weights),  [t2,  t3](control).  The  identi¬ 
fication  part  identifies  the  control  strategy  of  the  opponent  based  on  the  environ¬ 
mental  information  (see  Section  3.5).  This  process  has  been  thoroughly  discussed 
in  Section  3.5.  Evaluating  the  weights  of  the  network  happens  in  the  subinterval 
or  [tii,t2].  In  each  of  the  small  subintervals  [tu,  ti(,+i)|,  i=l,  2,  ...,  N-1, 
the  weights  of  each  layer  are  updated.  The  process  is  repeated  and  ends  at  t2-  At 
<2,  actual  control  is  applied  to  the  system. 

3.7.  Applications  of  the  Learning  Algorithm  To  the  Prob¬ 
lem  of  Aircraft  Control  In  The  Presence  of  Windshear 

The  particular  problem  we  are  considering  here  is  that  of  control  of  an  aircraft 
encountering  windshear  after  take-off.  Much  effort  has  gone  into  modeling  and 
identifying  windshear:  e.g.  [23,  100),  but  only  some  of  it  has  been  concerned  with 
the  design  of  controllers  to  enhance  the  chances  for  survival.  .Among  these  are  the 
studies  of  Miele[63]  and  of  Leitman[55],  et  al.  A  stabilized  controller  is  proposed 
by  Leitman  in  which  no  a  priori  bounding  information  is  needed. 

In  the  last  section,  we  already  derived  the  updating  rule  for  the  neural  con¬ 
troller  for  the  problem  of  a  differential  game  with  neural  networks.  Since  the 
updating  rule  can  be  applied  to  many  general  control  problems,  we  shall  use  the 


:•  A  ■ 
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results  derived  in  the  last  section  for  our  aircraft  control  problem.  We  shall  present 
simulation  results  also.  The  simulation  results  are  not  only  reasonable  but  also 
ad%’antageous  over  the  controller  proposed  by  Leitman. 

We  use  the  following  notations; 


Notation 


D  =  drag  force,  lb: 

g  “=  gravitational  force  per  unit  mass,  ft  sec'*; 
h  1=  vertical  coordinate  of  aircraft  center  of  mass  (altitude),  ft\ 

L  lift  force,  lb: 
m  aircraft  m<iss,  lb  fr^  sec^; 

0  mass  center  of  aircraft; 

5  "=  reference  surface,  ft^; 
t  =  time,  sec; 

thrust  force,  lb: 

K  aircraft  speed  relative  to  wind  based  reference  frame,  ft  sec'*; 
Wz  '=  horizontal  component  of  wind  velocity,  ft  sec'*; 

\V^^  vertical  component  of  wind  velocity,  ft  sec'*; 

X  horizontal  coordinate  of  aircraft  center  of  mass,  ft: 

Q  '=  relative  angle  of  attack,  rad: 

7  "=  relative  path  inclination,  rad: 

6  ==  thrust  inclination,  rad: 
p  air  density,  lb  ft^  sec^. 
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3.7.1.  The  Problem 


Following  Miele’s  lead,  we  employ  equations  of  motion  for  the  center  of  mass  of 
the  aircraft,  in  which  the  kinematic  variables  are  relative  to  the  ground  while  the 
dynamic  ones  are  taken  relative  to  a  moving  but  non-rotating  frame  translating 
with  the  wind  velocity  at  the  aircraft’s  center  of  mass.  The  kinematic  equations 
are 

X  =  Vcos(7)  -}-  Wr, 

k  =  Vsini'f)  +  Wf^. 

The  dynamical  equations  are  [55] 

mV  =  Tco3{a  +  6)  —  D  —  mgsinj  —  m{WxCosf  -h  Wf^sinj),  (3.97) 
mV-y  =  Tsin{a  +  6)  — L  —  mgcosy —  m{Wxsin'y —  Whcosf),  (3.98) 

where  T  =  T{V)  is  the  thrust  force,  D  =  D{h^V^a)  is  the  drag,  L.  —  L{h,V,a) 
is  the  lift,  and  —  Wx{x,h)  and  =  Wh{x,h)  are  the  horizontal  and  vertical 
windshears,  respectively.  In  these  equations,  x{t),h{t),V(t),'^{t)  are  the  state 
variables  and  the  angle  of  attack  a{t)  is  the  control  variable. 


(3.95) 

(3.96) 


A  discrete-time  version  of  equations  (3.97)  and  (3.98)  is  given  by 

Vfc+i  =  fi{V,,n,D,nk^Kk.WHk,ak) 

TfcCos(afc d)A<  Dis^t  ,  •  -• 

=  Vk  + - gsin-ik^t  -  {WxkCos')k  +  WhkStn-^k)^i. 

m  m 

(3.99) 

"tk+l  =  f2iVk,Tk,Lk,‘fk,^xk,^VhkyOk) 

-  rtam(Q» -h  ^)A<  LkAt  gcos^fk^t  {Wxksin-fk  -  WhkCos-yk)^t 
mVk  mVk  mVk  V* 


(3.100) 
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where  the  notations  Wxk,  ^Vhk  denote  the  variables  at  the  time  t  = 

k.  Our  goal  is  to  design  a  controller  a  =  ojt  such  that  the  quantity  [/ijt+i  —  hr]^  is 
minimized,  where  hk+\  is  a  value  calculated  from  Vk+i  and  at+i  and  hr  is  a  given 
value.  The  cost  function  is  given  by 

Jik+\)  =  {Vk+isin^tk+i  -  hr)\  (3.101) 

The  neural  controller  is  designed  so  that  the  weights  update  at  each  step  to 
minimize  J{k  +  \). 

3.7.2.  Assumptions 

The  following  assumptions  are  the  same  cis  Miele’s[55]: 

1)  The  rotational  inertia  of  the  aircraft  and  the  sensor  and  actuator  dynamics  are 
neglected. 

2)  The  aircraft  mass  is  constant. 

3)  Air  density  is  constant. 

4)  Flight  is  in  the  vertical  plane. 

5)  Maximum  thrust  is  used. 


3.7.3.  Bounded  Quantities 

In  order  to  account  for  aircraft  capabilities,  it  is  assumed  that  there  is  a  maximum 
attainable  value  of  the  relative  angle  of  attack  a;  that  is,  a  €  [0,  a.],  where  a.  >  0. 
The  range  of  practical  values  of  the  relative  aircraft  speed,  V,  is  also  limited,  that 
is, 

v<v<v, 


(3.102) 
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where  V  >  0  and  V  >  V  depend  on  the  specific  aircraft[55|. 

3.7.4.  Force  Terms 

The  thrust,  drag  and  lift  force  terms  can  be  approximated  [55]  by 


T  = 

Ao  +  AiV  -h  AjK*, 

(3.103) 

D  = 

\cDfSV\ 

(3.104) 

L  = 

(3.105) 

where  Co  =  Bq  +  Bia^,  Cl  —  Cq  +  C\a.  The  coefficients  Aq,  At,  Aa  depend 
on  the  altitude  of  the  runway,  the  ambient  temperature  and  the  engine  power 
setting.  Bo,  Bi,  Co,  Cl,  on  the  other  hand,  depenc.  on  the  flap  setting  and  the 
undercarriage  position. 

3.7.5.  Windshear  Model 

In  this  work,  we  utilize  the  windshear  model  [55]  described  by  the  following  equa¬ 
tions 

Wx  =  —Wxosin{27rt/TQ}, 

VF;,  =  -WTo[l-cos(2jf/7’o)]/2,  (3.106) 

where  and  IFfco  are  given  constants,  reflecting  the  windshear  intensity  and  To 
is  the  total  flight  time  through  downburst. 

3.7.6.  Controller  Design 

We  employ  a  neural  controller  with  one  input  layer  of  four  neurons.  The  input 
variables  to  the  network  are  V,V,‘y,j.  The  control  output  is  given  by  Ok  = 
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<i){wi{V  —  V(0))  +  W2V  +  il’3(7  —  7(0))  +  W4-y),  where  the  initial  values  V'(O)  and 
7(0)  will  be  given  in  the  next  subsection.  The  threshold  function  is  the  sigmoidal 
function  T{x)  =  A  .  where  g  is  a  design  gain  and  A  is  saturation  limit.  In 
our  study  /I  =  q..  .As  discussed  before,  the  formula  for  updating  weights  is 


Wi{k  +  1)  =  Wt(k)  —  Q, 


1) 

dwi{k) 


where  is  given  by  the  following  set  of  equations 


(3.107) 


dJ{k+\) 

dwi{k) 

dJ{k+\) 

dok 

dJ{k+l) 

dik+i 

dJikAl) 

dVk^i 

dak 

dM.) 

dok 

df2{.) 

dLk 


dh{.) 

dak 

dM-) 

dak 

dDk 

dPk 

dak 


dJ{k+l)  d4>k 
dak  dwi{k)' 

dJ{kAi)dM-)  dJ{k  +  i)dM-) 

d-fk+1  dak  dVk+i  dak 
=  2(V4+ism7jt+,  -  /i,)\4+iCOS7fc+i, 

=  2{Vk+isin‘yk-ki  -  Ar)sm7*+,, 

dM-)  df2{.)dLk 

dak  dLk  dak ’ 


mVk 

At 

mVk' 


cos(ak  +  6), 


C,I/2 

5^  -  ^c.pSK, 


dM.)  dhi-)9Dk 

dak  dDk  dak  ’ 
sin(ak  +  6), 


(3.108) 


(3.109) 


(3.110) 

(3.111) 

(3.112) 

(3.113) 

(3.114) 

(3.115) 

(3.116) 
13.117) 
(3.118) 


(3.119) 
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3.7.7.  Numerical  Data 


As  a  specific  model,  we  use  one  model  for  a  Boeing- 727  aircraft  with  JT8D-17 
turbofan  engines.  We  assume  that  the  aircraft  hcis  become  airborne  from  a  runway 
located  at  sea-level.  The  data  are  identical  to  those  of  Miele 

16“, 

3V-sec, 

44564.0  lb, 


a.  = 
C  = 
Aq  = 


4i  =  -23.98  /6/r'sec, 
Aj  -  0.01442  Ibfrhec, 

S  =  2“, 

p  =  0.002203  lbfr*sec\ 

S  =  1560  ft^ 

Bo  =  0.0218747, 

Bi  =  0.6266795, 

Co  =  0.2624993, 

C,  =  5.3714832, 
mg  =  ISOOOO  lb. 

V  =  184  ftsec~^, 

V  =  422  ftsec-\ 

At  =  0.001  sec, 
hr  =  33.6807, 
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while  the  initial  conditions  are  x(0)=0  ft,  hf0)=50  ft,  and  V(0)=‘276.S  ft/sec, 
7(0)  =  6.9S9". 

3.7.8.  Simulation  Results 

Numerical  simulations  were  carried  out  for  the  case  where  the  windshear  intensity 
is  Wro/VV/io  =  50/30.  Simulation  results  show  that  the  neural  controller  performs 
well  in  the  presence  of  wind.  From  the  windshear  model,  we  know  that  in  the  first 
30  seconds  the  horizontal  wind  blows  against  the  aircraft.  In  the  frist  20  seconds, 
the  aircraft  gains  the  altitude.  As  the  wind  becomes  less  strong,  the  lift  of  the 
aircraft  decreases  and  the  aircraft  gradually  slows  its  climbing  rate  and  begins 
losing  it  altitude  at  the  20th  second.  To  compensate  this,  the  angle  of  attack 
incre2ises  correspondingly.  This  is  shown  in  Figure  3.20.  After  the  30rd  second, 
the  horizontal  wind  blows  in  the  direction  of  the  aircraft  which  continues  loosing 
its  altitude  even  through  the  angle  of  attack  increases  to  try  to  compensate  the 
loss.  As  the  wind  becomes  less  stronger  from  the  45th  second  to  the  60th  second, 
the  aircraft  increases  its  altitude. 

Under  the  same  conditions,’ the  neural  controller  works  better  than  those  of 
Leitemann’s  and  Miele’s  in  the  sense  that  the  control  value  reached  the  satiired 
limit  (in  this  case  16  degrees  for  the  angle  of  attack)  for  only  short  period  of 
time.  The  performance  of  the  aircraft  is  almost  the  same  cis  that  of  Leitman. 
That  is  because  the  angle  of  attack  did  not  reach  high  enough  from  the  45th 
second  to  the  60th  second  to  compensate  the  loss  of  the  altitude  of  the  aircraft. 
The  reason  is  that  the  greatest  descent  algorithm  has  low  convergence  rate.  To 
improve  the  performance,  one  should  consider  using  an  optimization  method  with 


a  higher  convergence  rate.  In  fact,  this  is  one  of  our  further  research  directions. 
For  a  windshear  with  stronger  wind  intensity,  e.g.  W^o/'.  ^ho  =  SO/48,  we  need 
to  adjust  the  four  weights  accordingly.  With  this  type  of  neural  controllers,  the 
performance  depends  on  the  sensitivity  of  the  gradient  of  the  cost  function  to  the 
change  of  the  measure  error.  With  a  suitable  choice  of  learning  rates  01,02, 03,  o^. 
the  performance  should  be  improved. 


Figure  3.20:  Simulation  Results  For  the  Aircraft  Control  Problem 
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4. 

Optimal  Control  Problem  in  the  Layered  Defense  Project 

4.1.  Introduction 

In  this  chapter,  an  application  of  artificial  intelligence  methodology,  more  exactly, 
a  rule-based  expert  system,  in  a  class  of  pursuit-evasion  game  problems  based  on 
the  Semantic  Control  Theory  will  be  discussed.  The  object  of  this  effort  is  to 
apply  the  principles  of  the  semantic  control  theory  to  situation  assessment  in  a 
layered  defense  system.  A  tactical  decision  aid  has  been  developed  to  assist  its 
user  in  the  selection  of  heading,  speed,  and  countermanuevers  in  the  presence  of  a 
real  or  potential  threat.  The  project,  started  in  June,  1991,  is  a  cooperative  effort 
of  the  Center  for  Semantic  Control,  Washington  University,  and  the  Electronics 
and  Space  Corporation  and  has  been  proven  to  be  successful. 

A  game  problem  of  multi-pursuer,  single-evader  is  the  main  one  for  this  project. 
The  following  players  exist:  one  evader,  also  termed  Ownship.  several  purse.ers. 
and  a  limited  number  of  shallowing  players  engaged  in  the  game  situation.  There 
are  two  types  of  pu'suers:  [ijl  primary  pursuers  wliicli  usually  represent  aggressive 
aircraft,  ecpiipped  with  air-tp-air  and  all-aspect  missiles:  [ii]  secondary  i)ursuers 
which  typically  represent  the  offensive  missiles  launched  by  primary  pursuers. 
Both  primary  pursuers  and  the  evader  have  the  capability  of  spawning  shadowing 
players  which  represent  passive  objects  to  blind  the  opponents,  such  as  flairs.  Dy¬ 
namics  can/cannot  be  incorporated  into  the  shadowing  players,  depending  on  the 
types  of  shadowing  players  used.  The  initial  stage  of  the  project  will  discuss  the 
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situation  with  one  evader,  onc/several  secondary  pursuers  and  a  limited  number 
of  shadowing  players  for  both  the  primary  pursuer  and  evader. 

The  TDA  has  no  control  over  the  strategies  of  either  primary  pursuers  or 
secondary  pursuers,  yet  it  has  the  capability  of  detecting  and  identifying  their 
strategies  and  maneuvers,  mainly  by  means  of  Contact  Reports.  The  contact 
reports  contain  enough  information  about  the  pursuers  and  the  evader,  such  as 
location(coordinates),  speed,  heading,  bearing,  etc.,  all  of  which  can  be  processed 
to  assess  the  situation.  The  TDA  receives  the  Contact  Reports  every  fixed  period 
of  time  so  that  new  information  can  be  updated  periodically.  While  the  human  will 
assume  the  role  of  the  controller,  the  functions  that  the  TDA  will  have  to  perform 
are  [25]  obtaining  data  from  Contact  Reports  and  updating  the  player  information, 
using  new  information  to  assess  the  current  game  situation,  etc.  Processing  new 
information  from  the  contact  reports  includes  updating  the  information  about 
each  of  the  players  and  storing  the  old  information  on  each  player  to  an  instance 
of  the  class  OLD.  Once  the  process  is  finished,  a  role  or  the  optimal  control 
strategy  will  be  used  to  govern  the  next  movement  of  the  evader.  A  detailed 
discussion  is  given  below. 

As  shown  in  Figure  4.1,  the  structure  of  the  TDA  is  organized  hierarchically. 
Each  player  in  the  TDA  is  called  an  object.  Typical  examples  of  objects  are  the 
primary  pursuers,  the  secondary  pursuers,  and  the  evader  which  we  also  instances 
of  some  appropriate  class.  Each  object  belongs  to  one  class  which  is  orgcinized 
hierarchically.  Each  class  is  a  subclass  of  its  parent  class,  and  all  classes  are  the 
subclass  of  the  root  class  called  “ROOT”.  Each  object  has  its  attributes,  stored 
in  memory  called  “Slots”,  which  are  either  inherited  from  its  parents  or  locally 
resident.  Associated  with  each  object  are  the  rules,  methods,  and  functions  which 


can  be  used  to  interact  with  objects.  The  information  about  each  player  of  the 
game,  capabilities  for  each  player,  and  game  situation  are  all  sto’-ed  in  the  slots 
of  each  player.  Table  4.1  summarizes  the  slots  for  the  primary  pursuers  and  the 
evader.  .An  object  oriented  formulation  is  selected  because  it  allows  incremen¬ 
tal  refinement  of  its  situation  assessment,  knowledge  of  pursuer’s  capability',  and 
evasion  strategies. 


Table  4.1:  Table  of  Sifts 


PJcyer 

HESSHHIj 

Fc5;=on 

Sposd 

Fa.-.sa 

Eear>.g 
KeacL-.g 
Contact  ID 
QassL*:  cation 
Eve.-.t 

Con/ide.-.ce 

Li/etir.e 

.'•ta>5pc-cd 

Fe.-cepdo.n 

Kc-e 

Dcsn.naccn 

Risk  Uvel 

McxSa/e  Distanc. 
PctJc  Distie-.ce 
Nur^ber  of 
Shadc'.vlng  Players 

E.‘'.''ec:-.'e.-.ess 

Effective.-, css 

[>.i.-atic.n 

Spatial 

Ef-'cctiveress 

The  mission  of  the  evader  is  to  depart  from  Home,  while  evading  the  pursuit 
of  its  offensive  opponent,  and  to  head  to  a  fixed  location  called  “Destination"’. 
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The  role  of  the  evader  is  to  assess  the  game  situation,  detect  the  pursuer  strategy 
if  necessary  and  make  a  decision  based  on  the  updating  information.  It  does 
not  assume  any  offensive  capabilities.  With  this  assumption,  the  TD.A  presents 
suggestions  of  movement  for  the  user’s  approval.  The  movement  of  the  evader 
is  governed  by  the  rule  fired  from  the  rule-based  system  or  the  optimal  control 
strategy.  Whether  a  role  or  the  optimal  control  strategy  will  be  used  depends 
on  the  range  between  two  players.  Assuming  a  fixed  distance,  we  can  have  more 
than  one  rule  to  select,  in  which  C2ise  a  quantity  called  “utility”  plays  a  key  role. 
A  utility  is  a  positive  real  number  scaling  from  zero  to  one,  associated  with  each 
pursuer.  The  utility  can  be  thought  of  as  a  measurement  of  threat  to  the  evader. 
The  pursuer  with  the  highest  utility  value  is  said  to  have  greatest  threat  to  the 
evader  and  hence  should  be  paid  the  greatest  attention.  The  utility  of  the  pursuer 
depends  on  five  components  -  range,  heading,  elevation,  speed  and  bearing  •  all 
of  which  has  been  obtained  through  processing  the  information  from  the  contact 
reports.  A  detailed  discussion  of  the  utility  calculation  can  be  found  in  [25]. 

Recent  work  in  the  area  of  Semantic  Control  Theory  [76]  provides  the  means 
for  our  project.  In  this  paradigm,  a  control  problem  is  broken  into  three  blocks, 
namely.  Identifier,  Goal  Selector,  and  Adaptor  [76].  In  Ithis  project,  the  Goal 
Selector  is  designed  for  the  following  [i]  To  select  the  rule  from  the  rule-brised 
system  based  on  the  information  provided  by  the  Systern  Identifier,  which  has 
preprocessed  data  from  the  Contact  Reports,  [ii]  To  activate  the  optimal  control 
law  based  on  the  information  of  range,  and  [iii]  To  compute  the  heading  for  the 
next  movement  of  the  evader.  The  decision  of  the  Goal  Selector  depends  on 
several  factors;  range,  heading,  bearing,  speed  and  elevation.  W'hen  the  pursuer 
is  within  a  certain  distance  from  the  evader,  the  optimal  control  law  is  activated 
to  achieve  a  fast  response  to  the  situation.  In  other  cases,  the  Rule-Based  system 
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plays  a  key  role.  There  are  four  sets  of  rules  [25].  They  are:  [i]  the  set  of  rules  for 
the  caise  in  which  no  pursuers  are  within  a  specified  range,  [ii]  the  set  of  rules  for 
the  case  in  which  one  primary  pursuer  is  within  a  specified  range,  [iii]  the  set  of 
rules  for  the  case  in  which  one  secondary  pursuer  is  within  a  specified  range,  and 
[iv]  the  set  of  rules  for  the  case  in  which  both  a  primary  pursuer  and  secondary 
pursuer  are  within  a  specified  range.  Each  of  these  four  rule  sets  can  have  several 
rules.  The  rule  set  appropriate  to  the  situation  is  made  active  and  used  to  make 
recommendations  for  safely  moving  the  evader.  Which  rule  will  be  fired  depends 
on  the  value  of  utility  associated  with  each  pursuer  which  represents  the  measure 
of  the  safety  for  each  player. 

The  function  of  TDA  is  to  provide  a  movement  set-point  recommendation  for 
the  user.  Therefore,  a  man-machine  graphic  interface  is  an  essential  part  of  the 
project.  A  human  assumes  control  of  the  final  decision.  The  Adaptor  consists 
of  four  different  graphic  displays:  Action  Panel,  Control  Panel,  Local  View  and 
Global  View  (see  Figure  4.2).  The  Action  Panel  shown  in  Figure  4.2(a)  displays 
the  setpoint  recommendation  for  the  user,  heading,  speed  and  location  of  each 
player  and  the  current  rule  being  used.  There  are  several  buttons  available  for 
the  user  to  decide  either  to  use  the  recommendation  of  the  TDA  or  to  enter  his  own 
command.  The  Action  Panel  is  updated  in  real-time  so  that  the  user  can  have 
a  view  of  the  on-going  game  situation.  The  Control  Panel  (cee  Figure  4.2(b)), 
on  the  other  hand,  is  in  control  of  the  behavior  of  each  player,  monitors  the 
game  situation,  and  has  the  authority  to  change  it  dramatically.  The  Control 
Panel  initializes  the  game,  modifies  the  attributes  of  each  playe.”  in  the  game,  and 
runs/stops  the  game.  The  Control  Panel  is  actually  the  first  display  shown  to  the 
user  when  the  simulation  begins.  Two  coordinate  frames  are  used  in  the  project. 
They  are  the  inertial  frame,  which  provides  a  global  view  for  the  game,  and  the 
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frame  centered  at  the  speed  direction  of  the  evader,  who  provides  a  view  locally. 
Local  and  Global  views  (see  Figure  4.2)  provide  the  basic  displays  in  the  initial 
stage  of  the  project. 


Figure  4.2:  Display  Panels 


The  TDA  is  currently  implemented  on  a  DELL  SYSTEM  333  D^‘'^  Personal 
Computer,  using  KAPP.A-PC^''^  and  Toolbook^-'^  under  the  Windows  envirn- 
ment.  For  a  complete  decription  of  the  project  and  its  operation,  please  refer  to 
[25]. 

In  sections  that  follow,  we  shall  discuss  the  optimal  control  problem  that 
arises  from  the  project.  The  background  for  this  problem  is  as  follows.  The 
Layered  Defense  Project's  aim  is  to  study,  analyze  and  solve  a  class  of  pursuit- 
evasion  problems  and  to  develop  a  tactical  decision  aid.  which,  in  the  presence 
of  a  real-time  or  potential  threat,  aids  its  user  in  the  selection  of  heading,  speed. 
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and  countermaneuvers.  An  assumption  at  initial  stage  of  this  project  is  that 
there  are  two  players  in  the  game  situation:  one  pursuer  and  one  evader.  The 
control  strategy  for  the  pursuer  is  generated  by  the  so-called  Scenario  Generator, 
which  also  provided  the  Contact  Report  to  TD.A.  The  evader  has  the  capability 
of  identifying  the  control  strategy  of  the  pursuer  but  has  no  control  over  it.  The 
role  of  the  evader  is  [i]  to  access  the  game  situation,  to  identify  the  strategy  of 
pursuer,  and  [ii]  to  take  corresponding  actions  governed  by  the  rule  fired  from  the 
rule-based  system  or  by  the  control  value  of  the  optimal  control  law,  depending 
on  the  range  R  between  the  pursuer  and  him.  If  R  is  greater  than  some  given 
value  Rq,  the  evader  may  want  to  continue  doing  what  he  has  been  doing.  While 
R  decrezLses  to  some  given  value  Ri  with  Ri  <  Rq,  which  usually  represents  the 
situation  that  the  pursuer  is  getting  closer  to  the  evader,  the  value  of  utility  for 
the  evader  is  high  enough  to  make  a  rule  fired  from  the  rule-based  systems.  Which 
rule  will  be  fired  depends  on  the  utility  associated  with  the  rule  [25].  However,  if 
the  rule  fired  for  governing  the  next  movement  cannot  improve  the  situation  and 
R  further  decreases  to  some  given  value  R2,  where  R2  <  Ri  <  Rq,  an  optimal 
control  strategy  is  employed  to  yield  an  accurate,  fast  response  to  the  situation. 
The  assumption  that  two  players  are  engaged  in  the  game  is  reasonable  since  only 
one  aggressive  plo,yer  shows  the  highest  potential  threat  to  the  ownship  and  hence 
will  be  paid  much  more  attention  than  others,  if  they  exit,  when  the  optimal 
control  law  is  used. 

With  the  line-of-sight  model,  our  problem  can  be  described  as  follows:  given 
<72,  a  control  strategy  for  the  pursuer,  find  an  optimal  control  for  the  evader,  i.e., 

‘^loptimal’  ^  I'kioptimalf  ^  that  the  range  between  two 

players  reaches  some  prespecified  value  in  minimum  time. 
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4.2.  Optimal  Control  Problem 

4.2.1.  Line  of  Sight  Coordinates 

A  line-of-sight  coordinate  model  has  been  used  to  study  the  pursuit-evasion  game 
with  fixed  role  determination  [17,  18,  19,  41,  S3.  92].  Using  the  line  of  sight  as 
common  reference,  we  have  three  state  variables.  These  state  variables  are  the 
raiige  0  <  R  <  oo  and  the  two  ofF-boresight  angles  -tt  <  d\  <  it,— it  <  (j>2  < 
[18,  19]. 

The  equations  of  motion  in  the  general  line  of  sight  coordinates  are 

R  —  — (cosoi  4- cosdj),  (4.1) 

(^1.  =  {3in4>\  +  sin02)/ R  +  (Ti,  (4.2) 

<t>2  —  {sin6t  + sin4>2)/R-\‘<T2,  (4.3) 

where  <r\  and  are  the  respective  control  variables  (turning  rates)  constrained 
by 

\(Ti\<l,  i  =  l,2.  (4.4) 

The  geometry  of  the  engagement  is  depicted  in  Figure  4.3. 


Figure  4.3:  Geometry  of  the  Engagement 
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A  major  element  in  the  differential  game  formulation  is  the  definition  of  the 
terminal  surface  (target  set).  Such  a  terminal  surface  represents  the  firing  envelope 
of  the  pursuer  aircraft  weapon  system.  The  firing  envelope  of  an  air-to-air  missile 
for  each  player  i  (i  =  1,  2)  is  a  subspace  defined  by  an  implicit  equation  of  3 
variables 


F{R,0i,a,)  =  0,  (4.5) 

where  R  is  the  range  (the  magnitude  of  the  line  of  sight  vector),  and  4>i  and  a, 
have  the  relation 


0i  +  a,  =  ff.  .  (4.6) 

In  [18],  Davidovitz  et  al  specified  the  terminal  surface  of  the  game,  a  special  form 
of  equation  (4.5),  as 

R{tj)  i  Rj 

=  0,  (4.7) 

where  (/is  the  final  time  of  the  game  and  3  is  some  prespecified  positive  value. 
With  this  terminal  surface,  one  can  evaluate  the  necessary  conditions,  which  are 
used  to  obtain  the  optimal  control  strategies.  Such  control  strategies  can  be 
evaluated,  using  retrointegration.  by  calculating  the  so-called  optimal  terminal 
strategy  at  the  terminal  surface.  The  target  set  eq.  (4.7)  has  been  successfully 
used  for  the  pursuit-evasion  game  analysis. 

On  the  other  hand,  for  an  air  combat  duel  between  similar  aggressive  fighter 
aircrafts,  both  equipped  with  the  same  type  of  guided  missiles,  different  target  sets 
are  used  to  represent  the  effective  firing  envelope  of  an  all-aspect  fire-and-forget 
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air-to-air  missile 

-0<<f>i<3,  t  =  l,2,  (4.8) 

R,{4>j)  <  R  <  Rii<i>jh  j  =  (4.9) 

where  3  is  the  ofF-boresight  limit  for  missile  firing,  while  R  and  R  are  the  minimum 
and  maximum  normalized  firing  ranges,  respectively,  given  by 

&i{<l>j)  =  ai  +  biCos4>j,  (4.10) 

Ri(((>j)  =  (Ro)i  ~  \(f>j  +  sin4>jl  (4.11) 

where  a,,  bi  and  (.60)1  are  the  normalised  parameters  (see  [19]). 

For  our  problem,  a  terminal  surface  similar  to  that  in  (18)  is  used,  i.  e.  R{t/)  = 
0,  where  0  is  a.  prespecified  positive  value.  In  [18],  0  is  specified  to  be  less  than 
the  initial  distance  R{to)  since  0  represents  the  distance  for  capture.  In  our 
case,  however,  0  is  a.  positive  value  greater  than  R{to)  since  our  problem  is  that 
representing  escape. 

In  general,  the  solution  consists  of  the  decomposition  of  the  game  space  into 
four  regions:  the  respective  winning  zone  of  the  two  opponents,  the  draw  zone, 
and  the  region  where  the  game  terminates  by  a  mutual  kill.  Davidovitz  ef  al  [19] 
presented  a  qualitative  study,  the  first  of  its  kind,  of  an  air  combat  between  two 
similar  aircrafts  equipped  with  modern  air-to-air  missiles,  which  is  modeled  as 
a  two-target  differential  game.  Their  results  of  the  study  also  reveal  several  yet 
unknown  elements  to  be  e.xpected  in  later  air  combat. 

For  our  problem,  since  the  initial  stage  of  the  project  only  discusses  two  players 
(pursuer  and  evader  or  termed  as  “Ownship”),  the  line-of-sight  coordinate  model 
is  ideal  for  our  quantitative  analysis.  However,  unlike  [19],  where  two  aggressive 


aircrafts  with  all-aspect  fire-and-forget  air-to-air  missiles  are  considered,  we  have 
a  limited  number  of  players  engaged  in  the  game:  pursuer  and  evader.  To  better 
model  the  real-time  pursuer-evader  situation,  we  modify  the  classical  line-of-sight 
model  in  [17]  to  incorporate  the  speeds  of  two  players 

R  =  —{VpCOs4>i  +  VeCOi>l^2)i  (d.i2) 

(*>1  =  {vpsin(l>i  +  Vesin(i>2)/ R  +  (7i,  (-f.13) 

<i>2  =  {vpsin(j)i  +  Vtsin<l>2)l  R  +  cr2i  (d-M) 

where  the  state  variables  R,4>i,(l>2  have  the  same  definitions  ai  before,  Vp  repre¬ 
sents  the  speed  of  the  pursuer  and  u*  for  that  of  evader.  Again,  the  respective 
controls  (turning  rates)  are  constrained  by 

k.'l  <  1-  ! 


4.2.2.  Optimal  Control  Law 


The  problem  can  be  stated  as 

min  J 

r, 

subject  to 


R  =  —{UpCO-SOi  +  (.'cCO.SOi), 

<f>l  =  {VpSitKpi  +  VeSin02)/R  + CTi, 
4>2  =  {Vpsin(^>i  +  Vgsin(i>2)/R  +  cr2, 


(4.15) 

(4.16) 

(4.17) 
(4.1S) 


where  iR{to),<f>i{to),<f>2itQ))  =  {Ro,  4>\o,  <l>2o)  and  (/fo,  <Aio,  02o)  are  given  and  the 
control  for  the  pursuer  has  been  replaced  by  its  estimate  value  d-j. 
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Consider  the  Hamiltonian  given  by 


H  =  Ao  +  A/j(-(upC0S(^i  +  UeCOS(;i2)) 

Ai  +  A2 , 


+- 


R  '  " 


{VpSin4>i  +  Vcsin(p2) 


+  (Ai(Ti  +  A25^2), 


(4.19) 


where  Aq,  \r,  Ai  and  A2  are  the  respective  components  of  the  gradient  vector 
satisfying  the  adjoint  equations 


=  ivpStn4>i  +  VeSind>2)i  ), 

(4.20) 

Ai 

=  (  XflStTK^i  ^  COS(hl)Vp, 

(4.21) 

A2 

=  (  XRSind>2  ^  cosd>2)v,, 

(-1.22) 

Aq 

=  0. 

(4.23) 

The  optimal  control  is  obtained  by 


minU, 


which  yields 

(Ty{t)  =  -sgnAHC- 

Next,  we  shall  state  a  lemma  which  will  be  used  later. 
Lemma  4.1  The  folloiuing  equality  holds 

+  =  C, 


(4.24) 


(!.2o) 


(4.26) 


where  C  is  a  constant. 


Proof:  DifTerentiating  the  right  hand  side  of  eq.  (4.26),  we  have 

Ai  +  A2 


•■)\  \  ,  .T/ -^1  + '^2  ^  (Ai  +  A2)^  —  (Ai  +  A2)/? 

-AfiAfi  +  2(- 


R 


B? 


/?2 


=  2[Afi^^i^^(i;psm^i  +  Vesin<j>2)  + 

x((-Aft5m(iiUp  -  n  cos<i)xVp)R 
H 

■\-[—\RS'in(t)2Vt - — cosd2Vt)R 

H 

+  (Al  +  A2)(t'pCOSOi  +  VeCOS<i)2))/ R] 

=  2^^J^[XRVpsin(f>i  +  XRVesin(f>2  -  \RVpsin0i 

Ai  +  A2  \  •  i  Ai  +  A2  , 

-VpCOS0i  —  ARVcStn02 - n - VeC0S02 


+ 


R 

Ai  +  A2 
R 


R 


{VpCOS<i>i  +  VeCOS(i>2)\ 


=  0. 


Thus,  eq.  (4.26)  follows. 
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(4.27) 


O.E.D. 


REMARKS: 

The  constant  C  could  be  zero  or  nonzero.  If  C  is  zero,  A/j(<)  =  0  for  all  t  >  0. 
Ill  particular,  A/j(0)  =  0.  Thus,  we  may  choose  \r{0)  ^  0  such  that  C  7^  0.  For 
C  ^  0,  we  can  uorinali/e  the  formula  (4.26)  sucli  that 


Using  eq.  (4.17),  cq.  (4.20)  can  be  written  a.s 

Afi  =  {Vpsind\  +  i)e5ind2)( 


A]  +  A2 


R 


=  (^i, -<T,)(1-AJ,)5 


(  l.ilS) 


(4.29) 
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Now, 


^/T—  —  cos[(^i  +  +  Co). 


(4.38) 


Tlierefore, 


\ 


=  -Vp{XRsin(i}i  +  — ^^cq3(;!»i) 

=  -Vp(sin{<i)i  +  0i+  Co  jsmdi  +  co5(,;5j  +  <?,  +  Co)cosOi ) 


—  —VpCOs{<f>i  +  +  Co  —  <^l) 

=  —VpCos{6i  +  Co) 

=  -VpCos{Co  -  4>io  -  f  (Ti(r)Jr).  U.39) 

Jo 

Without  loss  of  generality,  we  may  assume  that  (Ti(r)  =  =  constant  for  0  < 

T  <t\ ,  where  t\  is  the  first  switching  time  greater  than  zero. 

Then, 


where 


Ai(t)  =  Ai(0)4 — <;i>io  ~ '7*o<  +  Co),  (4.40) 

<^10 


^10  =  -sgn[A,o]. 


(4.41) 


The  value  of  can  I)e  oI)taine(l  by  letting  Ai(/i)  =  0.  In  our  rase. 


hk 


(Co  —  <Pio)  4-  !^in 


<^1*0 


(AllUlZlii) 


and 


ti  —  min  ilk- 

tik>0 


(4.42) 


The  subsequent  switching  time  can  be  calculated  analogeously. 
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In  general  suppose  that  {<fc;  ^-  =  1,2.  •  •  •}  is  a  sequence  of  switching  times  and 


- ^ 

that 

01*  =  0i(4-), 

(4.43) 

Afl*  =  hi(fk). 

(4.44) 

-it 

11 

(4.45) 

erik  =  sgn[Aifc]. 

(4.46) 

Then,  on  (<fc,<j:+i],  we  have 

Ai(0 

=  -Vpcos(0k  kiit)  +  Ck), 

(4.47) 

=  sin(Ot(t)  +  Ok+i(i)  +  Ck), 

(4.4S) 

=  -0l(<*)  -  /  <r;(T)dT, 

(4.49) 

clit) 

=  -sgn[Ai(<)]. 

(4.50) 

Notice  that  on  we 

have 

An(0 

=  sin(oi(t)  +  Ok(t)  +  Ck-\), 

(4.51) 

and 

Ok{t) 

=  -Oi(/fc_i)-/  a'(r)r/r. 

•'U-i 

(  1..52) 

Again,  on  (tk-i-tk]',  ^i(T)  = 

=  constant,  thus  wc  hav(' 

= 

=  “0l(4-l)  —  ~  1  )• 

(4..53) 

Thus, 

Afijt  = 

=  ^R(tk) 

= 

-  sin{d>i{tk)  +  Ok{tk)  +  Ck-\ ) 

(4.54) 

rjfi 


Xnk  =  limAR(0 

=  Wm  sin{<i>i{t)  +  Ok+\(t)  +  Ck) 

=  sin{(^>i{tk)  -  4>iitk)  +  Ck) 

=  sinCk 

Letting  XRk  =  ~Xnk  yields 

sin{Ck)  =  sin{(i>iitk)  +  Okitk)  +  Ck-i),  (4.56) 


which  is  used  to  determine  Ck- 

Eqs.(4.43)  -  (4.50)  together  give  an  evaluation  of  an  extremal  bang-bang 
trajectory  based  on  the  choice  of  [Aro,  Aio]. 

REMARKS: 

Dependence  of  the  control  a\  on  <72  is  implicit.  The  corresponding  control  law  can 
be  evaluated  without  retrograde  integration.  Thus,  the  computation  lime-.savings 
is  significant.  Since  the  control  law  is  a  function  of  initial  values  for  co.stat(,'s  Ar 
and  Ai,  the  final  time  is  hence  a.  function  of  the  initial  values  of  A/;  and  Aj.  N’arious 
non-derivative  opt  imitation  techniques  can  l>e  employed  to  find  tin- optimal  values 
for  Ai  and  Ar.  The  so-called  Box’s  algorit lim  will  lie  used  for  uur  prolvlcm  Ix-cause 
of  its  well-known  inequality  constraints,  ami  nonlimuir  olqective  function. 
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4.3.  Optimization  Technique 

4.3.1.  Box’s  Complex  Algorithm 
The  Problem 

This  algorithm  is  to  find  the  maximum  of  a  multivariable,  nonlinear  funrtion 
subject  to  nonlinear  inequality  constraints 

max  F(xi,a:2,  •  •  • ,  Jv)  (d-57) 

subject  to  Gk  <  Xk  <  Hk,  =  1, 2,  •  ■  ■ .  A’, 

/,(.ri,j2,---.x,v)  < -Ti  <  F,(ji,X2,---,x.v),  1  =  A+1. •••..!/. 

where  the  functions  /,(j  i,X2,  •  •  •  .x.v), /q(xi, X2, •  •  •  ,x,y)  arc  dependent  functions 
of  the  explicit  independent  variables  Xi,X2,' •  •  ,x,v.  The  upper  and  lower  con¬ 
straints  Hk  and  Gk  are  either  constants  or  functions  of  the  independent  variables. 

Method 

The  procedure  is  based  on  the  '‘complex”  method  of  M.  J.  Box.  This  method 
is  a  sequential  search  technique  which  has  proven  efTeclive  in  .solving  probhuns 
with  nonlinear  objective  functions  sul)ject  to  nonlinear  ine(|uality  const rainis.  Xo 
d('rivafives  are  required.  The  proci'dure  shouhl  lend  to  find  the  global  maximum 
due  to  the  fact  that  the  initial  set  of  points  are  randomly  scattered  thiougliout 
the  feasible  region.  If  linear  constaints  are  present  or  eipiality  constraints  are  in¬ 
volved,  other  methods  should  prove  to  be  more  efficient.  The  algorithm  proceeds 
as  follows: 

1.  An  original  “complex"  of  K  >  A^  -f-  1  points  is  generated  consisting  of  a 
feasible  starting  point  and  K-1  additional  points  generated  from  random 


nunil)ers  and  constraints  for  each  of  the  independent  variables 


=  Gi  +  r,,j(//,  -  G,),  (4.5S) 

i  = 

j  =  1,2,- •  • ,  A' -  1, 

where  r,,j  are  random  numbers  between  0  and  1. 

2.  The  selected  points  must  satisfy  both  the  explicit  and  implicit  constraints.  If 
at  any  time  the  explicit  constraints  are  violated,  the  point  is  moved  a  small 
distance  6  inside  the  violated  limit.  If  an  implicit  constriant  is  violated, 
the  point  is  moved  one  half  of  the  distance  to  tlie  centroid  of  the  remaining 
points 

Xijinew)  =  (x,j(o/d)  +  x,,c)/2,  t  =  1, 2,  •  •  • ,  (4.59) 

where  the  coordinates  of  the  centroid  of  the  remaining  points,  .r,,c  arc  defined 

'jy 

1  A- 

1  =  1.2.---,,V.  (4.60) 

^  k=l 

This  process  is  repeated  as  necessary  tintil  all  the  implicit  constraints  are 
satisfied. 

3.  The  objective  function  is  evaluated  at  each  point.  The  new  iioint  is  Kealed 
at  a  distance  o  times  as  far  from  the  centroid  of  the  remaining  points  as  the 
distance  of  the  rejected  point  on  the  line  joining  the  rejected  point  and  the 
centroid 


Xij{new) 


a(xi,c  -  Xi,j(old))  +  x,,c, 


i  =  1,2,---,.V.  (4.61) 
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4.  If  the  new  point  repeats  in  giving  the  lowest  function  values  on  consecutive 
trials,  it  is  moved  one  half  the  distance  to  the  centroid  of  the  remaining 
points. 

5.  The  new  point  is  checked  against  the  constraints  and  is  adjusted  as  before 
if  the  constraints  are  violated. 

6.  Convergence  is  assumed  when  the  objective  function  values  at  each  point 
are  within  (3  units  for  7  consecutive  iterations.  An  iteration  is  defined  as 
the  calculations  required  to  select  a  new  point  which  satisfies  the  constraints 
and  does  not  repeat  in  yielding  the  lowest  function  value. 


4.3.2.  Implementation 


For  our  problem,  Box’s  complex  algorithm  is  used  to  find  the  minimum  of  a 
multivariable  nonlinear  function  subject  to  a  set  of  nonlinear  equality  constraints 

min  J(Aio,A/?o),  (4.02) 


subject  to 


-!J.O  <  A,o  <  9.0. 

-1.0  <  A/,0  <1.0.  ,  ' 

li  =  -{VpCOSOi  +  V(COSd2). 

<f>i  =  (Vpsin(i>i  +  Vesin(i>2)/R-\-cru 

<l>2  =  (Vpsin(f>i  +  VfSin<i>2)l  R  +  <72, 

=  -sgn[Ai(t)], 

\x{t)  =  ><i{tk)  +  ^sin{-4>ik-crl{t-tk)  +  Ck),  te{tk.tk+i] 

<7i 
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01fc  —  t  &  (tk^h+\] 

^\k  ~sgn[Ai(<i)],  t  €  (<fc,<fc+i] 

sinCk  =  sin{(f)i{tk)  +  0kitk)-\-Ck-\), 

Co  =  sin'^iXfto), 

where  J(Aio,  A/jo)  =  —  to,  <o  is  the  initial  time  value  and  </  is  the  terminal  time 

at  which  the  distance  between  two  players  reaches  a  given  value.  Therefore,  in 
our  case,  we  have  N^2,  M=K=2. 

For  our  problem,  the  following  parameters  have  been  used 

Gl  =  -9.0, 
in  =  9.0, 

G2  =  -1.0, 

H2  =  1.0, 

A'  =  4, 

0  =  0.001, 

o  =  1.3, 

7  =  3, 

S  =  C.Ol. 

We  have  two  indepeiulcnt  variables:  \io.\ro  and  we  do  not  have  roinifraint  fnnr- 
tions  of  explicit  independent  variables.  In  our  implcinenfation,  after  we  randomly 
generate  a  complex  of  K  points,  we  compare  the  values  at  each  point  of  the  com¬ 
plex  with  that  at  the  centroid  of  all  points.  If  the  value  at  the  centroid  is  the 
highest,  we  reselect  the  complex  of  starting  points  until  the  highest  value  does 
not  occur  at  the  centroid  of  all  points.  Figures  4.6  and  4.7  show  the  simulation 
results  of  our  problem. 


\ 

\ 

\ 
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In  Figures  4.6  and  4.7,  ImbdaR  and  Imbdal  represent  the  values  of  A/?  and 
Aio.  The  solid  line  represnts  the  trajectory  of  the  evader  and  the  dashed  line 
reoresents  the  trajectory  of  the  pursuer.  From  these  plots,  we  can  see  clearly  how 
the  values  of  and  Aio  affect  the  trajectories  of  both  players. 


The  parameters  used  in  our  simulation  are: 


The  speed  of  evader;  i>e 
The  speed  of  pursuer:  Vp 
Initial  distance;  Rc 
Final  distance:  Rj 
Initial  off-boresight  angle; 
Initial  off-boresight  angle;  (i>2 
Fixed  control  for  the  pursuer:  <T2 


20.0  (units). 
25.0  (units), 
100.0, 

200.0, 

0.52, 

57.33, 

0.55. 


The  goal  of  the  optimization  technique  is  to  minimize  the  time  such  that  R  in¬ 
creases  to  a  given  value  Rj. 

From  above,  we  know  that  although  the  optimal  control  law  is  obtained  as¬ 
suming  that  the  speeds  for  both  players  are  the  same,  the  control  law  also  works 
fine  for  the  case  with  different  speeds. 

In  actual  iniplemeiitation.  the  proce.ss  of  control  evaluation  is  descrilied  as 
follows.  As  shown  in  Figure  4.4.  [<i,  is  one  sample  period.  In  this  sample  period. 
[ti,t2]  is  the  subinterval  for  identifying  <72.  Once  <72  is  obtained,  an  optimization 
process  using  Box’s  complex  algorithm  is  carried  out  for  the  evader  to  obtain 
a  sequence  of  optimal  control  values  {u(<3),  u(t6),  Once  the  optimization 
process  is  completed,  a  control  value  from  the  sequence  is  issued  to  govern  the  next 
movement.  This  whole  process  in  [<i,,  fj]  repeats  for  subsequent  time  intervals  (see 


Figure  4.4).  However,  if  the  ^2  is  assumed  fixed  for  entire  encounter,  a  slightly 
different  scheme  (see  Figure  4.5)  will  be  used,  in  which  only  one  identification 
process  and  one  optimization  process  are  needed  and  control  values  are  issued 
following  the  optimization  process. 


I  I  ,  optimization  process  ^  c  |  I  ^  optimization  process  c  |  .  .  . 

ti  tj  t5  <^7 

Figure  4.4:  Implementation  Scheme  1 


*>■  Time 


I  I  I  optimization  process  ^  C  |  C  | 


Time 


Figure  4.5;  Implementation  Scheme  2 


5.  Conclusion 


It  is  said  that  the  community  of  Automatic  Control  is  in  an  evolutionary  phase, 
which  is  not  a  revolution.  This  seems  to  be  true.  But  although  we  have  expe¬ 
rienced  the  astonishing  revolutionar}'  phase  of  this  field  in  the  early  sixties,  we 
should  not  ov^^rlook  the  great  progress  we  have  made  and  remarkable  accomplish¬ 
ments  we  have  jachieved  since  then.  We  should  not  forget  that  each  single  step  in 
this  long  march  was  made  by  the  joint  efforts  of  each  individual  in  our  society. 


It  is  the  author’s  hope  that  this  dissertation  will  be  one  of  those  small  con¬ 
tributions  towairds  the  progress  of  modern  technologies  in  the  field  of  artificial 
intelligence  methodologies  in  control  systems.  The  second  chapter  of  this  disser¬ 
tation  mathemitically  formulated  the  control  systems  inside  the  neural  networks. 
Introducing  a  small  feedback  loop  INSIDE  each  neuron,  instead  of  a  feedback  con¬ 
nection  in  the  network,  we  presented  the  discretized  version  of  recurrent  neural 
networks.  Using  these  types  of  neural  networks,  we  showed  how  to  use  the  inter¬ 
nal  states  directly  to  construct  a  feedback  control  law.  What  is  more  im])orlaiit. 
a  network  of  this  type  is  itself  a  system  and  not  an  unknown  "Black  Box",  ami 
thus  its  input-output  performance  can  be  studied  just  as  is  tlie  case  for  a  classical 
control  system.  Therefore,  many  conventional  synthesis  methods  can  be  directly 


borrowed  to  design  a  controller. 


The  third  chapter  of  this  dissertation  discusses  the  issues  of  applying  neural 
network  techniques  to  classical  differential  game  problems.  To  model  the  real-time 
game  situation  more  realistically,  a  configuration,  based  on  the  stages  in  real-life 
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conflicts,  is  proposed.  Based  on  the  paradigm  of  semantic  control,  the  config¬ 
uration  can  be  used  further  to  derive  t'.vo  paradigms  of  differential  games  witli 
neural  networks.  To  demonstrate  the  effectiveness  of  the  method,  we  carried  out 
a  simulation  e.xperiment  and  studied  a  pursuit-evasion  game  problem.  The  same 
principle,  paradigm  and  structure  have  the  potential  of  being  applicalilc  to  an 
entire  class  of  pursuit-evasion  game  problems.  We  also  studied  the  external  learn¬ 
ing  algorithm  for  a  neural  controller,  which  may  be  used  in  one  of  the  paradigms 
discussed  previously.  To  test  the  algorithm,  a  real-time  aircraft  control  problem 
in  the  presence  of  windsbear  hris  been  studied. 

The  fourth  chapter  of  this  dissertation  has  discus.;ed  the  Layered  Defense 
Project.  The  project,  which  was  initiated  in  .June,  1991,  is  a  real-time  pursuit- 
evasion  game  problem  with  one  evader  and  multi-pursuers.  Based  on  line-of-sight 
coordinates  this  dissertation  has  discussed  and  solved  the  optimal  control  problem 
arising  from  the  project.  Box’s  algorithm  has  been  used  to  find  the  optimal  values 
for  the  costates. 

Here,  the  following  areas  seem  to  be  of  sufficic'ut  interest  to  indicate  further 
research  and  investigation: 

1.  to  conlider  a  robust,  multi-purpose,  real-time  controller  for  various  types  of 
applicaiions  and  problems,  a  product  of  a  merger  of  the  adv''.uc('d  teelmiiiiu's 
in  the  areas  of  artificial  intelligence  and  control  systems. 

2.  to  more  Witelligently  and  ma.ssivety  incori)orate  parallel  computer  architec¬ 
ture  into  a  control  system,  e.  g.  using  neural  networks, 

3.  to  incorporate  the  graphical  interface  into  a  workstation-based  control  sys¬ 


tem. 


4.  to  estal)lisli  and  fully  utilize  a  data-base  for  knowledge  representation  and 
knowledge  process, 

5.  to  more  intelligently  apply  neural  network  techniques  for  control  systems, 
which  is  a  promising  area  since  the  learning  capacity,  robustness,  and  mem¬ 
ory  capacity  of  neural  networks  provide  revolutionary  tools  and  mechanisms 
for  the  future  of  control  systems. 

To  summarize,  we  have  the  following  contributions  of  this  dissertation  to  the 
area  of  artificial  intelligence  methodologies  in  aerospace  and  other  control  systems; 

I 

1.  Although  several  applications  are  observed  in  using  recurrent  neural  net¬ 
works  for  control  systems  [73,  74,  93),  there  is  no  one  in  the  past  few  years 
who  has  studied  the  issues  of  controllability/observability,  linearizability 
via  change  of  coordinates  for  such  type  of  neural  networks  for  control  sys¬ 
tems.  This  dissertation  has  covered  these  interesting  topics  and  the  results 
are  satisfactory.  For  the  first  time,  the  so-called  Separation  Principle  of 
Learning  and  Control  is  proposed.  The  significance  of  this  study  lies  in  the 
thoughts  of  exploring  the  intelligence/learning  capacity  and  parallel  archi¬ 
tectures  of  neural  net'vorks  for  the  purpose  of  control.  This  study  has  shown 
the  promise  for  future  research  in  this  direction. 

2.  Motivated  by  the  works  in  [77,  95],  we  have  develoj)ed  a  new  approach  to 
differential  games  with  neural  networks.  The  approach  which  is  based  on 
the  semantic  control  theory  is  more  realistic  to  the  real-life  conflicts  and  has 
the  potential  of  being  applicable  to  an  entire  class  of  pursuit-evasion  game 
problems.  The  study  is  significant  for  the  community  of  differential  games. 
In  our  study,  the  assumption  that  both  players  act  optimally  at  all  times 
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is  no  longer  valid.  This  is  particularly  true  for  air-combat  problems  since 
during  the  real  combat  modeling  an  entire  encounter  in  a  fixed  mathematical 
equation  is  essentially  not  realistic,  and  both  players  (assuming  only  two 
players  in  the  encounter)  always  act  according  to  their  opponent’s  perceived 
action  and  their  goals.  Therefore,  the  configuration  based  on  this  idea  is 
much  closer  to  the  real-time  situation  and  can  be  used  to  develop  more 
advanced  algorithms. 

3.  The  aircraft  control  in  the  presence  of  windshear  is  an  interesting  prob¬ 
lem.  The  learning  algorithm,  which  is  originally  developed  for  the  neural 
controller  in  Chapter  3,  is  applied  to  the  control  of  aircraft  encounting  wind- 
shear.  Explicit  formulas  for  evaluating  weights  in  a  neural  controller  have 
been  given.  The  approach  offers  the  advantages  such  as  being  easy  to  im¬ 
plement  in  practice,  being  applicable  in  several  different  windshear  models 
without  any  change  of  control  law. 

4.  Chapter  4  of  this  dissertation  has  discussed  another  aspect  of  artificial  intel¬ 
ligence  techniques  for  control  systems:  rule-based  expert  system  applied  in  a 
class  of  pursuit-evasion  game  problems.  Line-of-sight  coordinates  have  been 
used  by  several  authors  such  as  .Shinar  [82.  81],  in  study  of  pursuit-evasion 
game  problems.  Based  on  the  line-of-sight  coordinates  this  dissertation  has 
discussed  and  solved  the  optimal  control  problem  arising  from  tlic  project. 
There  are  three  main  differences  between  their  approaches  and  ours.  First, 
their  solution  requires  ret  reintegration  of  costate  equations,  which  is  usually 
very  time-consuming.  The  derived  optimal  control  solution  for  our  problem 
has  an  explicit  formula  which  can  be  implemented  in  time-forward  fash¬ 
ion.  Thus,  time-saving  in  implementing  the  solution  is  significant  since  it 
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does  not  require  retrointegration  which  is  normally  performed  in  computing 
optimal  control  solutions.  Second,  the  Box’s  complex  algorithm  has  been 
incorporated  into  our  optimal  control  problems.  This  particular  aspect  is 
interesting.  Third,  in  our  approach,  the  pursuer’s  strategy  is  assumed  to 
be  known  and  fixed  during  the  evaluation  of  optimal  control  while  in  Shi- 
nar’s  work  the  control  strategies  for  both  pursuer  and  evader  are  evaluated 
simultaneously.  This  observation  suggests  the  potential  application  of  our 
results  in  Chapter  3  to  this  particular  project.  From  above,  we  can  see  that 
our  approach  does  offer  several  advantages  over  previous  work. 
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